Top Related Projects
Central repository for tools, tutorials, resources, and documentation for robotics simulation in Unity.
Reinforcement Learning Environments for Omniverse Isaac Gym
A toolkit for developing and comparing reinforcement learning algorithms.
A modular high-level library to train embodied AI agents across a variety of tasks and environments.
Bullet Physics SDK: real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc.
Quick Overview
Midscene is an open-source 3D scene editor designed for web applications. It provides a user-friendly interface for creating and manipulating 3D scenes directly in the browser, making it easier for developers and designers to work with 3D content without the need for complex desktop software.
Pros
- Browser-based, making it accessible across different platforms
- User-friendly interface for easy 3D scene creation and editing
- Integrates well with web technologies and frameworks
- Open-source, allowing for community contributions and customization
Cons
- May have performance limitations compared to desktop 3D editing software
- Potentially limited feature set compared to more established 3D editors
- Dependency on browser capabilities and WebGL support
- Learning curve for users new to 3D scene editing
Code Examples
// Initialize Midscene
const scene = new Midscene.Scene();
const renderer = new Midscene.Renderer(document.getElementById('canvas'));
// Add a 3D object to the scene
const cube = new Midscene.Cube();
scene.add(cube);
// Render the scene
renderer.render(scene);
// Add lighting to the scene
const light = new Midscene.PointLight();
light.position.set(0, 5, 10);
scene.add(light);
// Apply material to an object
const material = new Midscene.StandardMaterial({
color: '#ff0000',
metalness: 0.5,
roughness: 0.5
});
cube.material = material;
// Add user interaction
const controls = new Midscene.OrbitControls(renderer.camera, renderer.domElement);
// Animate the scene
function animate() {
requestAnimationFrame(animate);
cube.rotation.y += 0.01;
renderer.render(scene);
}
animate();
Getting Started
-
Include Midscene in your project:
<script src="https://cdn.jsdelivr.net/npm/midscene@latest/dist/midscene.min.js"></script> -
Create a canvas element in your HTML:
<canvas id="scene-canvas"></canvas> -
Initialize Midscene and create a basic scene:
const scene = new Midscene.Scene(); const renderer = new Midscene.Renderer(document.getElementById('scene-canvas')); const cube = new Midscene.Cube(); scene.add(cube); renderer.render(scene); -
Run your web application and see the 3D scene in action!
Competitor Comparisons
Central repository for tools, tutorials, resources, and documentation for robotics simulation in Unity.
Pros of Unity-Robotics-Hub
- Comprehensive robotics simulation environment with ROS integration
- Extensive documentation and tutorials for robotics development
- Active community support and regular updates
Cons of Unity-Robotics-Hub
- Steeper learning curve for non-Unity developers
- Limited to Unity engine, which may not be suitable for all robotics projects
- Requires more computational resources for complex simulations
Code Comparison
Unity-Robotics-Hub:
using Unity.Robotics.ROSTCPConnector;
using RosMessageTypes.Geometry;
public class RobotController : MonoBehaviour
{
ROSConnection ros;
public string topicName = "cmd_vel";
Midscene:
import { Scene, PerspectiveCamera, WebGLRenderer } from 'three';
import { GLTFLoader } from 'three/examples/jsm/loaders/GLTFLoader';
const scene = new Scene();
const camera = new PerspectiveCamera(75, window.innerWidth / window.innerHeight, 0.1, 1000);
The Unity-Robotics-Hub code snippet demonstrates ROS integration and robot control, while the Midscene code focuses on 3D scene setup using Three.js. Unity-Robotics-Hub is more specialized for robotics, whereas Midscene is a general-purpose 3D visualization tool.
Reinforcement Learning Environments for Omniverse Isaac Gym
Pros of OmniIsaacGymEnvs
- Provides a comprehensive suite of reinforcement learning environments for robotics simulation
- Leverages NVIDIA's Isaac Sim for high-fidelity physics and rendering
- Offers seamless integration with popular RL frameworks like OpenAI Gym
Cons of OmniIsaacGymEnvs
- Requires more computational resources due to its advanced physics simulation
- Has a steeper learning curve for users unfamiliar with Isaac Sim or robotics simulation
- Limited to robotics and physics-based simulations, unlike Midscene's focus on 3D scene understanding
Code Comparison
OmniIsaacGymEnvs:
from omniisaacgymenvs.utils.task_util import initialize_task
from omniisaacgymenvs.tasks.ant import AntTask
task = initialize_task(task_cfg, AntTask)
env = task.get_env()
Midscene:
import { Scene } from 'midscene';
const scene = new Scene();
scene.load('path/to/3d/model.glb');
The code snippets highlight the different focus areas: OmniIsaacGymEnvs initializes robotic simulation tasks, while Midscene loads and manipulates 3D scenes.
A toolkit for developing and comparing reinforcement learning algorithms.
Pros of gym
- Well-established and widely used in the reinforcement learning community
- Extensive documentation and tutorials available
- Supports a wide range of environments for various RL tasks
Cons of gym
- Primarily focused on reinforcement learning, limiting its use in other domains
- Can be complex for beginners to set up and use effectively
- Some environments may require additional dependencies
Code Comparison
gym:
import gym
env = gym.make('CartPole-v1')
observation = env.reset()
for _ in range(1000):
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
midscene:
import { Scene } from 'midscene';
const scene = new Scene();
scene.add(new Cube());
scene.render();
scene.export('scene.gltf');
Key Differences
- gym is Python-based and focused on reinforcement learning environments
- midscene is JavaScript-based and designed for 3D scene creation and manipulation
- gym provides a standardized interface for RL algorithms
- midscene offers tools for creating and exporting 3D scenes
Use Cases
- gym: Ideal for researchers and developers working on reinforcement learning projects
- midscene: Suitable for web developers creating 3D scenes for visualization or game development
A modular high-level library to train embodied AI agents across a variety of tasks and environments.
Pros of Habitat-lab
- Comprehensive 3D simulation platform for embodied AI research
- Extensive documentation and tutorials for easy onboarding
- Large community support and active development
Cons of Habitat-lab
- Steeper learning curve due to complex architecture
- Higher computational requirements for running simulations
- Limited flexibility for custom environments outside its predefined scenarios
Code Comparison
Habitat-lab example:
import habitat
env = habitat.Env(
config=habitat.get_config("benchmark/nav/pointnav/pointnav_gibson.yaml")
)
observations = env.reset()
Midscene example:
from midscene import MidScene
scene = MidScene()
scene.load("path/to/scene.json")
Summary
Habitat-lab is a robust platform for embodied AI research with extensive features and community support, while Midscene appears to be a simpler tool for scene manipulation. Habitat-lab offers more comprehensive simulation capabilities but may require more resources and learning time. Midscene seems more lightweight and potentially easier to integrate for specific scene-related tasks.
Bullet Physics SDK: real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc.
Pros of bullet3
- More mature and widely adopted physics engine with extensive documentation
- Supports a broader range of physics simulations, including soft body dynamics
- Offers better performance for large-scale simulations
Cons of bullet3
- Steeper learning curve due to its complexity and extensive feature set
- Larger codebase and potentially higher resource requirements
Code Comparison
bullet3:
btDefaultCollisionConfiguration* collisionConfiguration = new btDefaultCollisionConfiguration();
btCollisionDispatcher* dispatcher = new btCollisionDispatcher(collisionConfiguration);
btBroadphaseInterface* overlappingPairCache = new btDbvtBroadphase();
btSequentialImpulseConstraintSolver* solver = new btSequentialImpulseConstraintSolver;
btDiscreteDynamicsWorld* dynamicsWorld = new btDiscreteDynamicsWorld(dispatcher, overlappingPairCache, solver, collisionConfiguration);
midscene:
import { Scene } from 'midscene';
const scene = new Scene();
scene.addObject(new Cube({ position: [0, 0, 0], size: [1, 1, 1] }));
scene.render();
Note: The code comparison highlights the difference in complexity and setup between the two libraries. bullet3 requires more detailed configuration for its physics simulation, while midscene offers a simpler, higher-level API for scene creation and rendering.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Midscene.js
Driving all platforms UI automation with vision-based model
ð£ v1.0 Release Notice
We have released v1.0. It is currently published on npm.
The v1.0 docs and code are on https://midscenejs.com/ and themainbranch.
The v0.x docs and code are on https://v0.midscenejs.com/ and thev0branch.
The v1.0 changelog: https://midscenejs.com/changelog
Showcases
autonomously register the GitHub form in a web browser and pass all field validations.
Plus these real-world showcases:
- iOS Automation - Meituan coffee order
- iOS Automation - Auto-like the first @midscene_ai tweet
- Android Automation - DCar: Xiaomi SU7 specs
- Android Automation - Booking a hotel for Christmas
- MCP Integration - Midscene MCP UI prepatch release
See more real-world showcases â click to explore: showcases Community showcase: robotic arm + vision + voice for in-vehicle testing
ð¡ Features
Write Automation with Natural Language
- Describe your goals and steps, and Midscene will plan and operate the user interface for you.
- Use Javascript SDK or YAML to write your automation script.
Web & Mobile App & Any Interface
- Web Automation: Either integrate with Puppeteer, Playwright or use Bridge Mode to control your desktop browser.
- Android Automation: Use Javascript SDK with adb to control your local Android device.
- iOS Automation: Use Javascript SDK with WebDriverAgent to control your local iOS devices and simulators.
- Any Interface Automation: Use Javascript SDK to control your own interface.
For Developers
- Three kinds of APIs:
- Interaction API: interact with the user interface.
- Data Extraction API: extract data from the user interface and dom.
- Utility API: utility functions like
aiAssert(),aiLocate(),aiWaitFor().
- MCP: Midscene provides MCP services that expose atomic Midscene Agent actions as MCP tools so upper-layer agents can inspect and operate UIs with natural language. Docs
- Caching for Efficiency: Replay your script with cache and get the result faster.
- Debugging Experience: Midscene.js offers a visualized replay back report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need.
ð Zero-code Quick Experience
- Chrome Extension: Start in-browser experience immediately through the Chrome Extension, without writing any code.
- Android Playground: There is also a built-in Android playground to control your local Android device.
- iOS Playground: There is also a built-in iOS playground to control your local iOS device.
⨠Driven by Visual Language Model
Midscene.js is all-in on the pure-vision route for UI actions: element localization and interactions are based on screenshots only. It supports visual-language models like Qwen3-VL, Doubao-1.6-vision, gemini-3-pro, and UI-TARS. For data extraction and page understanding, you can still opt in to include DOM when needed.
- Pure-vision localization for UI actions; the DOM extraction mode is removed.
- Works across web, mobile, desktop, and even
<canvas>surfaces. - Far fewer tokens by skipping DOM for actions, which cuts cost and speeds up runs.
- DOM can still be included for data extraction and page understanding when needed.
- Strong open-source options for self-hosting.
Read more about Model Strategy
ð Resources
- Official Website: https://midscenejs.com
- Documentation: https://midscenejs.com
- Sample Projects: https://github.com/web-infra-dev/midscene-example
- API Reference: https://midscenejs.com/api
- GitHub: https://github.com/web-infra-dev/midscene
ð¤ Community
ð Awesome Midscene
Community projects that extend Midscene.js capabilities:
- midscene-ios - iOS Mirror automation support for Midscene
- midscene-pc - PC operation device for Windows, macOS, and Linux
- midscene-pc-docker - Docker image with Midscene-PC server pre-installed
- Midscene-Python - Python SDK for Midscene automation
- midscene-java by @Master-Frank - Java SDK for Midscene automation
- midscene-java by @alstafeev - Java SDK for Midscene automation
ð Credits
We would like to thank the following projects:
- Rsbuild and Rslib for the build tool.
- UI-TARS for the open-source agent model UI-TARS.
- Qwen-VL for the open-source VL model Qwen-VL.
- scrcpy and yume-chan allow us to control Android devices with browser.
- appium-adb for the javascript bridge of adb.
- appium-webdriveragent for the javascript operate XCTestã
- YADB for the yadb tool which improves the performance of text input.
- libnut-core for the cross-platform native keyboard and mouse control.
- Puppeteer for browser automation and control.
- Playwright for browser automation and control and testing.
ð Citation
If you use Midscene.js in your research or project, please cite:
@software{Midscene.js,
author = {Xiao Zhou, Tao Yu, YiBing Lin},
title = {Midscene.js: Your AI Operator for Web, Android, iOS, Automation & Testing.},
year = {2025},
publisher = {GitHub},
url = {https://github.com/web-infra-dev/midscene}
}
⨠Star History
ð License
Midscene.js is MIT licensed.
Top Related Projects
Central repository for tools, tutorials, resources, and documentation for robotics simulation in Unity.
Reinforcement Learning Environments for Omniverse Isaac Gym
A toolkit for developing and comparing reinforcement learning algorithms.
A modular high-level library to train embodied AI agents across a variety of tasks and environments.
Bullet Physics SDK: real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot