This tutorial aims to help users quickly deploy and experience UI-TARS model capabilities through UI-TARS Desktop and Midscene.

1. Online deployment

Use Inference Endpoints to deploy the model online when local resources are insufficient.

Deployment Steps

Choose an instance, here are our recommended choices for different model sizes:

Take 7B model as an example, we choose “Nvidia L40S 1GPU 48G” instance:

Set Max Number of Tokens (per Query) to 32768

Set Max Batch Prefill Tokens to 32768

Set Max Input Length (per Query) to 32767

Add CUDA_GRAPHS=0 to avoid launching failed. Check this issue for details.