ExtremControl is a humanoid whole-body control framework aiming to minimize the latency when responding to high-level control interface. Instead of relying on whole-body retargeting, we directly control the robot's extremities. To further reduce the latency, we incorporate a velocity feedforward term into PD controller.

Built on top of ExtremControl, we develop a humanoid teleoperation system that achieves end-to-end latency as low as 50ms. Although our experiments are conducted with MoCap and VR systems, the framework is interface-agnostic—you can seamlessly integrate your own high-level control input.

The codebase is fully open-sourced and built on Genesis, enabling additional features beyond what is shown here. Click the link to explore more.

ExtremControl is a humanoid whole-body control framework minimizing the latency when responding to high-level control interface. Instead of relying on conventional whole-body retargeting, we directly control the robot's extremities. To further reduce the latency, we incorporate a velocity feedforward term into the PD controller.


Built on top of ExtremControl, we develop a humanoid teleoperation system that achieves end-to-end latency as low as 50ms. Although our experiments are conducted with MoCap and VR systems, the framework is interface-agnostic—you can seamlessly integrate your own high-level control input. The codebase is fully open-sourced.


All videos are presented at their original speed.

Whole-body Control without Whole-body Retargeting

Whole-body retargeting has become popular following the release of Unitree G1, as its 7-DoF arms offer redundancy beyond the 6-DoF rigid pose of end-effectors. However, even state-of-the-art whole-body retargeting methods still incur a latency of around 30 ms (e.g., GMR tested on an i9-13900K), partly due to the over-constrained formulation of tracking targets. Moreover, inverse kinematics must be solved sequentially, which limits the update rate to approximately 30Hz—even slower than the commonly used 50Hz control frequency.

To address this challenge, we move away from traditional whole-body retargeting and instead directly control the robot's extremities. The proposed Cartesian-Space Mapping (shown in the middle) is fully parallelizable and requires only 0.3ms to compute. To demonstrate that the selected links are sufficiently expressive, we also present a joint-space retargeting result (shown on the right), implemented with a customized IK solver that takes only 10ms. The mapped link poses are sufficient to reconstruct a reasonable full-body configuration, allowing us to eliminate this 10ms computation.

Go Beyond Position Control and Embrace Velocity Feedforward

Since the very begining of parallel simulation for locomotion, position-only PD control seems to be the only option. However, due to the damping term, the target joint position must lead the actual position by a certain angle to maintain the joint velocity, deviating from the definition of "target" joint position. To address this issue, we introduce a velocity feedforward term where the target velocity is computed from the difference betweenlast two target joint positions. To prevent overshoot, the velocity gain is set as the damping gain scaled by a feedforward ratio smaller than 1. We derive a theoretical upper bound in our paper, which is further validated through experiments.

We also provide a demonstration of how velocity feedforward term affects the control behavior:

Velocity Feedforward Ratio:
0.00

The above demonstration is conducted using PD gains corresponding to a natural frequency of 10 rad/s and a damping ratio of 1.0. In our paper, we further show that, for low-frequency sinusoidal inputs, the induced control delay remains constant. Since any control signal can be decomposed into a sum of sinusoidal components, this delay generalizes to arbitrary trajectories and can be interpreted as an effective latency. Consequently, it manifests as a part of the end-to-end delay scaled by a factor of approximately 0.6, which is an empirical observation from our real-world experiments.

Evolve from Human to Cobra

Here is a vivid illustration of the end-to-end latency (response time) among different systems:

reaction_time

To the best of our knowledge, NO prior work has systemetically analyzed the end-to-end latency of humanoid teleoperation. This may be because existing systems operate at latencies that are still clearly perceptible to humans. However, as latency approaches the perceptual threshold, there is an urgent need for objective quantification. We use an optical MoCap system as ground truth to measure latency by attaching markers to the robot's rubber hand. In addition, we implement an optical-flow-based hand tracking method to estimate end-to-end latency, as illustrated in the video:

In the above video, the position curves are obtained from the optical MoCap system and closely match the optical-flow curves. In our experiments, the difference in measured end-to-end latency between the two methods is less than 10 ms, suggesting that optical-flow-based estimation is a reliable proxy for comparing different systems. We report selected results in the previous figure; please refer to the paper for the complete table, including ablations across different systems.

Everything is Open-Sourced

Due to hardware limitations, we evaluate ExtremControl only on Unitree G1, using an OptiTrack MoCap system and a Quest 2 with VIVE motion trackers. To facilitate follow-up research, we fully open-source both the codebase and the hardware setup. You can find the codebase at github. Although we provide example trained policy for Unitree G1, we recommend fine-tuning a new policy tailored to your own control interface for optimal performance.

Our Team

1University of Massachusetts Amherst,   2Carnegie Mellon University,   3MIT-IBM Watson AI Lab,   *Equal contribution

@misc{xiong2026extremcontrollowlatencyhumanoidteleoperation,
    title={ExtremControl: Low-Latency Humanoid Teleoperation with Direct Extremity Control}, 
    author={Ziyan Xiong and Lixing Fang and Junyun Huang and Kashu Yamazaki and Hao Zhang and Chuang Gan},
    year={2026},
    eprint={2602.11321},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
    url={https://arxiv.org/abs/2602.11321}, 
}

If you have any questions, please contact Ziyan Xiong and Lixing Fang.