Convert Figma logo to code with AI

bytedance logoUI-TARS-desktop

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

19,085
1,868
19,085
289

Top Related Projects

127,854

Microsoft PowerToys is a collection of utilities that help you customize Windows and streamline everyday tasks

119,659

:electron: Build cross-platform desktop apps with JavaScript, HTML, and CSS

98,671

Build smaller, faster, and more secure desktop and mobile applications with a web frontend.

41,438

Call all Node.js modules directly from DOM/WebWorker and enable a new way of writing applications with all Web technologies.

30,810

Create beautiful applications using Go

Portable and lightweight cross-platform desktop application development framework

Quick Overview

UI-TARS-desktop is an open-source desktop application framework developed by ByteDance. It provides a set of tools and components for building cross-platform desktop applications using web technologies like HTML, CSS, and JavaScript, while leveraging native capabilities of the operating system.

Pros

  • Cross-platform compatibility (Windows, macOS, Linux)
  • Seamless integration of web technologies with native OS features
  • Rich set of pre-built UI components and utilities
  • Active development and support from ByteDance

Cons

  • Limited documentation, especially for advanced features
  • Smaller community compared to more established frameworks like Electron
  • Potential learning curve for developers new to desktop app development
  • May have performance overhead compared to fully native applications

Code Examples

  1. Creating a basic window:
const { App, BrowserWindow } = require('ui-tars');

const app = new App();

app.on('ready', () => {
  const win = new BrowserWindow({
    width: 800,
    height: 600,
    title: 'My TARS App'
  });
  
  win.loadFile('index.html');
});

app.run();
  1. Using a native dialog:
const { dialog } = require('ui-tars');

dialog.showMessageBox({
  type: 'info',
  title: 'Information',
  message: 'This is a native dialog box',
  buttons: ['OK']
});
  1. Implementing a system tray icon:
const { Tray, Menu } = require('ui-tars');

const tray = new Tray('path/to/icon.png');
const contextMenu = Menu.buildFromTemplate([
  { label: 'Item 1', click: () => { console.log('Clicked Item 1'); } },
  { label: 'Exit', click: () => { app.quit(); } }
]);

tray.setToolTip('My TARS App');
tray.setContextMenu(contextMenu);

Getting Started

To start using UI-TARS-desktop, follow these steps:

  1. Install UI-TARS-desktop via npm:

    npm install ui-tars
    
  2. Create a new JavaScript file (e.g., main.js) and add the following code:

    const { App, BrowserWindow } = require('ui-tars');
    
    const app = new App();
    
    app.on('ready', () => {
      const win = new BrowserWindow({ width: 800, height: 600 });
      win.loadFile('index.html');
    });
    
    app.run();
    
  3. Create an index.html file in the same directory with your desired content.

  4. Run your app using:

    npx ui-tars main.js
    

For more detailed information and advanced usage, refer to the UI-TARS-desktop documentation.

Competitor Comparisons

127,854

Microsoft PowerToys is a collection of utilities that help you customize Windows and streamline everyday tasks

Pros of PowerToys

  • More comprehensive set of utilities for Windows power users
  • Actively maintained with frequent updates and new features
  • Larger community and user base, resulting in more feedback and contributions

Cons of PowerToys

  • Limited to Windows operating system
  • May have a steeper learning curve for some users due to the variety of tools
  • Potential for conflicts with other system utilities or software

Code Comparison

While a direct code comparison is not particularly relevant due to the different nature and scope of these projects, we can highlight some differences in their implementation:

PowerToys (C++):

void PowerRenameManager::Rename(bool closeWindow)
{
    // Rename logic implementation
}

UI-TARS-desktop (JavaScript):

const handleRename = (newName) => {
  // Rename logic implementation
};

UI-TARS-desktop focuses on UI components for desktop applications, while PowerToys provides a suite of system-wide utilities. The code structures reflect their respective purposes and target platforms.

119,659

:electron: Build cross-platform desktop apps with JavaScript, HTML, and CSS

Pros of Electron

  • Mature and widely adopted framework with extensive documentation and community support
  • Cross-platform compatibility (Windows, macOS, Linux) with native OS integration
  • Large ecosystem of tools, plugins, and extensions

Cons of Electron

  • Higher resource consumption and larger application size
  • Potential security vulnerabilities due to bundled Chromium
  • Slower performance compared to native applications

Code Comparison

UI-TARS-desktop:

import { app, BrowserWindow } from 'electron'
import { createWindow } from './window'

app.on('ready', () => {
  createWindow()
})

Electron:

const { app, BrowserWindow } = require('electron')

function createWindow () {
  const win = new BrowserWindow({ width: 800, height: 600 })
  win.loadFile('index.html')
}

app.whenReady().then(createWindow)

Key Differences

UI-TARS-desktop is a newer, less established project specifically designed for ByteDance's UI framework, while Electron is a mature, general-purpose framework for building cross-platform desktop applications using web technologies. UI-TARS-desktop may offer better performance and smaller application size, but Electron provides broader compatibility and a larger ecosystem of tools and resources.

98,671

Build smaller, faster, and more secure desktop and mobile applications with a web frontend.

Pros of Tauri

  • Cross-platform development with native system dialogs and notifications
  • Smaller bundle sizes due to leveraging system WebView
  • Strong security features with custom protocols and deep OS integration

Cons of Tauri

  • Less mature ecosystem compared to Electron-based solutions
  • Limited to WebView capabilities, which may not support all web features
  • Steeper learning curve for developers new to Rust

Code Comparison

UI-TARS-desktop (JavaScript):

import { app, BrowserWindow } from 'electron';

function createWindow() {
  const win = new BrowserWindow({ width: 800, height: 600 });
  win.loadFile('index.html');
}

Tauri (Rust):

use tauri::Manager;

fn main() {
  tauri::Builder::default()
    .setup(|app| {
      let window = app.get_window("main").unwrap();
      Ok(())
    })
    .run(tauri::generate_context!())
    .expect("error while running tauri application");
}

While UI-TARS-desktop uses Electron's JavaScript API for window management, Tauri employs Rust for core functionality, offering potentially better performance and security at the cost of a steeper learning curve for web developers.

41,438

Call all Node.js modules directly from DOM/WebWorker and enable a new way of writing applications with all Web technologies.

Pros of nw.js

  • More mature and established project with a larger community and ecosystem
  • Supports a wider range of desktop platforms, including Windows, macOS, and Linux
  • Offers more extensive documentation and examples for developers

Cons of nw.js

  • Larger application size due to bundling a full Chromium runtime
  • Potentially slower startup times compared to UI-TARS-desktop
  • May have higher memory usage for simpler applications

Code Comparison

UI-TARS-desktop:

import { app, BrowserWindow } from 'electron';
import { createWindow } from './window';

app.on('ready', () => {
  createWindow();
});

nw.js:

nw.Window.open('index.html', {
  width: 800,
  height: 600
}, function(win) {
  // Window is ready
});

The code snippets demonstrate the different approaches to creating windows in each framework. UI-TARS-desktop uses Electron's API, while nw.js has its own window creation method. Both frameworks allow developers to create desktop applications using web technologies, but their APIs and implementation details differ.

UI-TARS-desktop is a newer project specifically designed for ByteDance's needs, while nw.js is a more general-purpose solution with a longer history. Developers should consider their specific requirements, target platforms, and desired application performance when choosing between these frameworks.

30,810

Create beautiful applications using Go

Pros of Wails

  • Cross-platform support for Windows, macOS, and Linux
  • Seamless integration of Go and web technologies
  • Active community and regular updates

Cons of Wails

  • Steeper learning curve for developers new to Go
  • Limited native UI components compared to UI-TARS-desktop

Code Comparison

UI-TARS-desktop (JavaScript):

import { Button } from '@ui-tars/desktop';

const MyComponent = () => (
  <Button onClick={() => console.log('Clicked')}>
    Click me
  </Button>
);

Wails (Go):

import "github.com/wailsapp/wails/v2/pkg/runtime"

func (a *App) HandleClick() {
    runtime.LogInfo(a.ctx, "Button clicked")
}

UI-TARS-desktop focuses on providing a rich set of pre-built UI components for desktop applications, making it easier to create consistent and visually appealing interfaces. It's particularly well-suited for developers familiar with React and JavaScript ecosystems.

Wails, on the other hand, offers a powerful way to build desktop applications using Go for the backend logic and web technologies for the frontend. It provides more flexibility in terms of language choice and architecture but may require more setup and custom UI development.

Both frameworks aim to simplify desktop application development, but they cater to different developer preferences and project requirements.

Portable and lightweight cross-platform desktop application development framework

Pros of Neutralinojs

  • Cross-platform compatibility: Supports Windows, macOS, and Linux
  • Lightweight: Smaller application size compared to Electron-based alternatives
  • Native API access: Provides direct access to system APIs without browser limitations

Cons of Neutralinojs

  • Less mature ecosystem: Fewer third-party libraries and resources compared to UI-TARS-desktop
  • Limited UI frameworks: Primarily relies on web technologies for UI development
  • Smaller community: Less community support and fewer contributions

Code Comparison

Neutralinojs:

Neutralino.init();
Neutralino.events.on("windowClose", () => {
    Neutralino.app.exit();
});

UI-TARS-desktop:

import { app, BrowserWindow } from 'electron';
app.on('window-all-closed', () => {
    if (process.platform !== 'darwin') app.quit();
});

Both frameworks allow for creating desktop applications using web technologies, but Neutralinojs focuses on a more lightweight approach with direct system API access, while UI-TARS-desktop leverages the Electron framework for a more feature-rich development experience. Neutralinojs may be preferred for smaller applications or when resource constraints are a concern, while UI-TARS-desktop might be better suited for larger, more complex projects requiring extensive third-party integrations.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Agent TARS Banner

Introduction

English | 简体中文

TARS* is a Multimodal AI Agent stack, currently shipping two projects: Agent TARS and UI-TARS-desktop:

Agent TARS UI-TARS-desktop
Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.

It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.
UI-TARS Desktop is a desktop application that provides a native GUI Agent based on the UI-TARS model.

It primarily ships a local and remote computer as well as browser operators.

Table of Contents

News

  • [2025-06-25] We released a Agent TARS Beta and Agent TARS CLI - Introducing Agent TARS Beta, a multimodal AI agent that aims to explore a work form that is closer to human-like task completion through rich multimodal capabilities (such as GUI Agent, Vision) and seamless integration with various real-world tools.
  • [2025-06-12] - 🎁 We are thrilled to announce the release of UI-TARS Desktop v0.2.0! This update introduces two powerful new features: Remote Computer Operator and Remote Browser Operator—both completely free. No configuration required: simply click to remotely control any computer or browser, and experience a new level of convenience and intelligence.
  • [2025-04-17] - 🎉 We're thrilled to announce the release of new UI-TARS Desktop application v0.1.0, featuring a redesigned Agent UI. The application enhances the computer using experience, introduces new browser operation features, and supports the advanced UI-TARS-1.5 model for improved performance and precise control.
  • [2025-02-20] - 📦 Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
  • [2025-01-23] - 🚀 We updated the Cloud Deployment section in the 中文版: GUI模型部署教程 with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

Agent TARS

npm version downloads node version Discord Community Official Twitter 飞书交流群 Ask DeepWiki

Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.

It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.

Showcase

Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline

https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8


Booking Hotel Generate Chart with extra MCP Servers
Instruction: I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me Instruction: Draw me a chart of Hangzhou's weather for one month

For more use cases, please check out #842.

Core Features

  • 🖱️ One-Click Out-of-the-box CLI - Supports both headful Web UI and headless server) execution.
  • 🌐 Hybrid Browser Agent - Control browsers using GUI Agent, DOM, or a hybrid strategy.
  • 🔄 Event Stream - Protocol-driven Event Stream drives Context Engineering and Agent UI.
  • 🧰 MCP Integration - The kernel is built on MCP and also supports mounting MCP Servers to connect to real-world tools.

Quick Start

Agent TARS CLI
# Luanch with `npx`.
npx @agent-tars/cli@latest

# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g

# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key

Visit the comprehensive Quick Start guide for detailed setup instructions.

Documentation

🌟 Explore Agent TARS Universe 🌟

Category Resource Link Description
🏠 Central Hub Website Your gateway to Agent TARS ecosystem
📚 Quick Start Quick Start Zero to hero in 5 minutes
🚀 What's New Blog Discover cutting-edge features & vision
🛠️ Developer Zone Docs Master every command & features
🎯 Showcase Examples View use cases built by the official and community
🔧 Reference API Complete technical reference



UI-TARS Desktop

UI-TARS

UI-TARS Desktop is a native GUI agent for your local computer, driven by UI-TARS and Seed-1.5-VL/1.6 series models.

   📑 Paper    | 🤗 Hugging Face Models   |   ðŸ«¨ Discord   |   ðŸ¤– ModelScope  
🖥️ Desktop Application    |    👓 Midscene (use in browser)   

Showcase

InstructionLocal OperatorRemote Operator
Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting.
Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub?

Features

  • 🤖 Natural language control powered by Vision-Language Model
  • 🖥️ Screenshot and visual recognition support
  • 🎯 Precise mouse and keyboard control
  • 💻 Cross-platform support (Windows/MacOS/Browser)
  • 🔄 Real-time feedback and status display
  • 🔐 Private and secure - fully local processing

Quick Start

See Quick Start

Contributing

See CONTRIBUTING.md.

License

This project is licensed under the Apache License 2.0.

Citation

If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:

@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}