midscene

Your AI Operator for Web, Android, Automation & Testing.

10,444

709

10,444

View on GitHub

Top Related Projects

Unity-Robotics-Hub

2,436

Central repository for tools, tutorials, resources, and documentation for robotics simulation in Unity.

OmniIsaacGymEnvs

1,006

Reinforcement Learning Environments for Omniverse Isaac Gym

gym

36,752

A toolkit for developing and comparing reinforcement learning algorithms.

habitat-lab

2,731

A modular high-level library to train embodied AI agents across a variety of tasks and environments.

bullet3

13,923

Bullet Physics SDK: real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc.

Quick Overview

Midscene is an open-source 3D scene editor designed for web applications. It provides a user-friendly interface for creating and manipulating 3D scenes directly in the browser, making it easier for developers and designers to work with 3D content without the need for complex desktop software.

Pros

Browser-based, making it accessible across different platforms
User-friendly interface for easy 3D scene creation and editing
Integrates well with web technologies and frameworks
Open-source, allowing for community contributions and customization

Cons

May have performance limitations compared to desktop 3D editing software
Potentially limited feature set compared to more established 3D editors
Dependency on browser capabilities and WebGL support
Learning curve for users new to 3D scene editing

Code Examples

// Initialize Midscene
const scene = new Midscene.Scene();
const renderer = new Midscene.Renderer(document.getElementById('canvas'));

// Add a 3D object to the scene
const cube = new Midscene.Cube();
scene.add(cube);

// Render the scene
renderer.render(scene);

// Add lighting to the scene
const light = new Midscene.PointLight();
light.position.set(0, 5, 10);
scene.add(light);

// Apply material to an object
const material = new Midscene.StandardMaterial({
  color: '#ff0000',
  metalness: 0.5,
  roughness: 0.5
});
cube.material = material;

// Add user interaction
const controls = new Midscene.OrbitControls(renderer.camera, renderer.domElement);

// Animate the scene
function animate() {
  requestAnimationFrame(animate);
  cube.rotation.y += 0.01;
  renderer.render(scene);
}
animate();

Getting Started

Include Midscene in your project:

<script src="https://cdn.jsdelivr.net/npm/midscene@latest/dist/midscene.min.js"></script>

Create a canvas element in your HTML:
```
<canvas id="scene-canvas"></canvas>
```

Initialize Midscene and create a basic scene:

const scene = new Midscene.Scene();
const renderer = new Midscene.Renderer(document.getElementById('scene-canvas'));
const cube = new Midscene.Cube();
scene.add(cube);
renderer.render(scene);

Run your web application and see the 3D scene in action!

Competitor Comparisons

Unity-Robotics-Hub

2,436

Central repository for tools, tutorials, resources, and documentation for robotics simulation in Unity.

Pros of Unity-Robotics-Hub

Comprehensive robotics simulation environment with ROS integration
Extensive documentation and tutorials for robotics development
Active community support and regular updates

Cons of Unity-Robotics-Hub

Steeper learning curve for non-Unity developers
Limited to Unity engine, which may not be suitable for all robotics projects
Requires more computational resources for complex simulations

Code Comparison

Unity-Robotics-Hub:

using Unity.Robotics.ROSTCPConnector;
using RosMessageTypes.Geometry;

public class RobotController : MonoBehaviour
{
    ROSConnection ros;
    public string topicName = "cmd_vel";

Midscene:

import { Scene, PerspectiveCamera, WebGLRenderer } from 'three';
import { GLTFLoader } from 'three/examples/jsm/loaders/GLTFLoader';

const scene = new Scene();
const camera = new PerspectiveCamera(75, window.innerWidth / window.innerHeight, 0.1, 1000);

The Unity-Robotics-Hub code snippet demonstrates ROS integration and robot control, while the Midscene code focuses on 3D scene setup using Three.js. Unity-Robotics-Hub is more specialized for robotics, whereas Midscene is a general-purpose 3D visualization tool.

OmniIsaacGymEnvs

1,006

Reinforcement Learning Environments for Omniverse Isaac Gym

Pros of OmniIsaacGymEnvs

Provides a comprehensive suite of reinforcement learning environments for robotics simulation
Leverages NVIDIA's Isaac Sim for high-fidelity physics and rendering
Offers seamless integration with popular RL frameworks like OpenAI Gym

Cons of OmniIsaacGymEnvs

Requires more computational resources due to its advanced physics simulation
Has a steeper learning curve for users unfamiliar with Isaac Sim or robotics simulation
Limited to robotics and physics-based simulations, unlike Midscene's focus on 3D scene understanding

Code Comparison

OmniIsaacGymEnvs:

from omniisaacgymenvs.utils.task_util import initialize_task
from omniisaacgymenvs.tasks.ant import AntTask

task = initialize_task(task_cfg, AntTask)
env = task.get_env()

Midscene:

import { Scene } from 'midscene';

const scene = new Scene();
scene.load('path/to/3d/model.glb');

The code snippets highlight the different focus areas: OmniIsaacGymEnvs initializes robotic simulation tasks, while Midscene loads and manipulates 3D scenes.

gym

36,752

A toolkit for developing and comparing reinforcement learning algorithms.

Pros of gym

Well-established and widely used in the reinforcement learning community
Extensive documentation and tutorials available
Supports a wide range of environments for various RL tasks

Cons of gym

Primarily focused on reinforcement learning, limiting its use in other domains
Can be complex for beginners to set up and use effectively
Some environments may require additional dependencies

Code Comparison

gym:

import gym
env = gym.make('CartPole-v1')
observation = env.reset()
for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)

midscene:

import { Scene } from 'midscene';
const scene = new Scene();
scene.add(new Cube());
scene.render();
scene.export('scene.gltf');

Key Differences

gym is Python-based and focused on reinforcement learning environments
midscene is JavaScript-based and designed for 3D scene creation and manipulation
gym provides a standardized interface for RL algorithms
midscene offers tools for creating and exporting 3D scenes

Use Cases

gym: Ideal for researchers and developers working on reinforcement learning projects
midscene: Suitable for web developers creating 3D scenes for visualization or game development

habitat-lab

2,731

A modular high-level library to train embodied AI agents across a variety of tasks and environments.

Pros of Habitat-lab

Comprehensive 3D simulation platform for embodied AI research
Extensive documentation and tutorials for easy onboarding
Large community support and active development

Cons of Habitat-lab

Steeper learning curve due to complex architecture
Higher computational requirements for running simulations
Limited flexibility for custom environments outside its predefined scenarios

Code Comparison

Habitat-lab example:

import habitat
env = habitat.Env(
    config=habitat.get_config("benchmark/nav/pointnav/pointnav_gibson.yaml")
)
observations = env.reset()

Midscene example:

from midscene import MidScene
scene = MidScene()
scene.load("path/to/scene.json")

Summary

Habitat-lab is a robust platform for embodied AI research with extensive features and community support, while Midscene appears to be a simpler tool for scene manipulation. Habitat-lab offers more comprehensive simulation capabilities but may require more resources and learning time. Midscene seems more lightweight and potentially easier to integrate for specific scene-related tasks.

bullet3

13,923

Bullet Physics SDK: real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc.

Pros of bullet3

More mature and widely adopted physics engine with extensive documentation
Supports a broader range of physics simulations, including soft body dynamics
Offers better performance for large-scale simulations

Cons of bullet3

Steeper learning curve due to its complexity and extensive feature set
Larger codebase and potentially higher resource requirements

Code Comparison

bullet3:

btDefaultCollisionConfiguration* collisionConfiguration = new btDefaultCollisionConfiguration();
btCollisionDispatcher* dispatcher = new btCollisionDispatcher(collisionConfiguration);
btBroadphaseInterface* overlappingPairCache = new btDbvtBroadphase();
btSequentialImpulseConstraintSolver* solver = new btSequentialImpulseConstraintSolver;
btDiscreteDynamicsWorld* dynamicsWorld = new btDiscreteDynamicsWorld(dispatcher, overlappingPairCache, solver, collisionConfiguration);

midscene:

import { Scene } from 'midscene';

const scene = new Scene();
scene.addObject(new Cube({ position: [0, 0, 0], size: [1, 1, 1] }));
scene.render();

Note: The code comparison highlights the difference in complexity and setup between the two libraries. bullet3 requires more detailed configuration for its physics simulation, while midscene offers a simpler, higher-level API for scene creation and rendering.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Midscene.js

English | ç®ä½ä¸æ

Visual-driven AI Operator for Web, Android, iOS, Automation & Testing. Open-source and MIT licensed.

Showcases

Instruction	Video
Use JS code to drive task orchestration, collect information about Jay Chou's concert, and write it into Google Docs (By UI-TARS model)
Control Maps App on Android (By Qwen-2.5-VL model)
Using midscene mcp to browse the page (https://www.saucedemo.com/), perform login, add products, place orders, and finally generate test cases based on mcp execution steps and playwright example

ð¡ Features

Write Automation with Natural Language

Describe your goals and steps, and Midscene will plan and operate the user interface for you.
Use Javascript SDK or YAML to write your automation script.

Web & Mobile App & Any Interface

Web Automation ð¥ï¸: Either integrate with Puppeteer, Playwright or use Bridge Mode to control your desktop browser.
Android Automation ð±: Use Javascript SDK with adb to control your local Android device.
iOS Automation ð: Use Javascript SDK with iOS Simulator to control your local iOS devices and simulators.
Any Interface Automation ð: Use Javascript SDK to control your own interface.

Tools

Visual Reports for Debugging ðï¸: Through our test reports and Playground, you can easily understand, replay and debug the entire process.
Caching for Efficiency ð: Replay your script with cache and get the result faster.
MCP: Allows other MCP Clients to directly use Midscene's capabilities. Web MCP Android MCP

Three kinds of APIs

Interaction API ð: interact with the user interface.
Data Extraction API ð: extract data from the user interface and dom.
Utility API ð: utility functions like aiAssert(), aiLocate(), aiWaitFor().

ð Zero-code Quick Experience

Chrome Extension: Start in-browser experience immediately through the Chrome Extension, without writing any code.
Android Playground: There is also a built-in Android playground to control your local Android device.
iOS Playground: There is also a built-in iOS playground to control your local iOS device.

â¨ Driven by Visual Language Model

Midscene.js supports visual-language models like Qwen3-VL, Doubao-1.6-vision, gemini-2.5-pro and UI-TARS.

Capable of finding and understanding the target element on the page by just providing the screenshot.
No dom or semantic markups are required.
Less tokens and money cost compared to generalLLM models.
Support open-source models.

ð¡ Two Styles of Automation

Auto Planning

Midscene will automatically plan the steps and execute them. It may be slower and heavily rely on the quality of the AI model.

await aiAction('click all the records one by one. If one record contains the text "completed", skip it');

Workflow Style

Split complex logic into multiple steps to improve the stability of the automation code.

const recordList = await agent.aiQuery('string[], the record list')
for (const record of recordList) {
  const hasCompleted = await agent.aiBoolean(`check if the record ${record}" contains the text "completed"`)
  if (!hasCompleted) {
    await agent.aiTap(record)
  }
}

For more details about the workflow style, please refer to Blog - Use JavaScript to Optimize the AI Automation Code

ð Comparing to other projects

There are so many UI automation tools out there, and each one seems to be all-powerful. What's special about Midscene.js?

Visual-driven brings reliability and efficiency: By using visual-language models, Midscene.js is suitable for both web and mobile app automation, no matter the technology stack the interface is built with.
Debugging Experience: You will soon realize that debugging and maintaining automation scripts is the real challenge. Midscene.js offers a visualized report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need.
Open Source, Free, Deploy as you want: Midscene.js is an open-source project, and it supports self-hosted models.
Integrate with Javascript: You can always bet on Javascript ð

ð Resources

Home Page and Documentation: https://midscenejs.com
Sample Projects: https://github.com/web-infra-dev/midscene-example
API Reference: https://midscenejs.com/api.html
GitHub: https://github.com/web-infra-dev/midscene

ð¤ Community

ð Awesome Midscene

Community projects that extend Midscene.js capabilities:

midscene-ios - iOS automation support for Midscene
Midscene-Python - Python SDK for Midscene automation

ð Credits

We would like to thank the following projects:

Rsbuild and Rslib for the build tool.
UI-TARS for the open-source agent model UI-TARS.
Qwen-VL for the open-source VL model Qwen-VL.
scrcpy and yume-chan allow us to control Android devices with browser.
appium-adb for the javascript bridge of adb.
appium-webdriveragent for the javascript operate XCTestã
YADB for the yadb tool which improves the performance of text input.
Puppeteer for browser automation and control.
Playwright for browser automation and control and testing.

ð Citation

If you use Midscene.js in your research or project, please cite:

@software{Midscene.js,
  author = {Xiao Zhou, Tao Yu, YiBing Lin},
  title = {Midscene.js: Your AI Operator for Web, Android, iOS, Automation & Testing.},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/web-infra-dev/midscene}
}

â¨ Star History

ð License

Midscene.js is MIT licensed.

If this project helps you or inspires you, please give us a âï¸

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Unity-Robotics-Hub

Cons of Unity-Robotics-Hub

Code Comparison

Pros of OmniIsaacGymEnvs

Cons of OmniIsaacGymEnvs

Code Comparison

Pros of gym

Cons of gym

Code Comparison

Key Differences

Use Cases

Pros of Habitat-lab

Cons of Habitat-lab

Code Comparison

Summary

Pros of bullet3

Cons of bullet3

Code Comparison

Convert designs to code with AI

README

Midscene.js

Showcases

ð¡ Features

Write Automation with Natural Language

Web & Mobile App & Any Interface

Tools

Three kinds of APIs

ð Zero-code Quick Experience

â¨ Driven by Visual Language Model

ð¡ Two Styles of Automation

Auto Planning

Workflow Style

ð Comparing to other projects

ð Resources

ð¤ Community

ð Awesome Midscene

ð Credits

ð Citation

â¨ Star History

ð License

Top Related Projects

Convert designs to code with AI

ð¡ Features

ð Zero-code Quick Experience

â¨ Driven by Visual Language Model

ð¡ Two Styles of Automation

ð Comparing to other projects

ð Resources

ð¤ Community

ð Awesome Midscene

ð Credits

ð Citation

â¨ Star History

ð License