XRoboToolkit: A Cross-Platform Framework for Robot Teleoperation

Corresponding to Ning Yang (ning.yang@bytedance.com)
1 2 3

Abstract

The rapid advancement of Vision-Language-Action models has created an urgent need for large-scale, high-quality robot demonstration datasets. Although teleoperation is the predominant method for data collection, current approaches suffer from limited scalability, complex setup procedures, and suboptimal data quality. This paper presents XRoboToolkit, a cross-platform framework for extended reality based robot teleoperation built on the OpenXR standard. The system features low-latency stereoscopic visual feedback, optimization-based inverse kinematics, and support for diverse tracking modalities including head, controller, hand, and auxiliary motion trackers. XRoboToolkit's modular architecture enables seamless integration across robotic platforms and simulation environments, spanning precision manipulators, mobile robots, and dexterous hands. We demonstrate the framework's effectiveness through precision manipulation tasks and validate data quality by training VLA models that exhibit robust autonomous performance.

Video

Applications

Example applications of XRoboToolkit: (a) teleoperation with XR controllers for dual arm manipulation and mobile manipulators, (b) Dual UR5 manipulators with 2-DOF head tracking and stereo vision, (c) auxiliary motion trackers for robot elbow control in MeshCat visualization, and (d) dexterous hand tracking in Mujoco simulation.

BibTeX

@article{zhao2025xrobotoolkit,
      title={XRoboToolkit: A Cross-Platform Framework for Robot Teleoperation}, 
      author={Zhigen Zhao and Liuchuan Yu and Ke Jing and Ning Yang}, 
      journal={arXiv preprint arXiv:2508.00097},
      year={2025}
}