DIRECT: Durable, Intuitive, Repeatable, and Ergonomic Control Device for Robot Teleoperation

*Equal contribution
1Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague 2Faculty of Electrical Engineering, Czech Technical University in Prague

Demonstration of DIRECT controlling a Franka Emika Panda robot manipulator of the DROID platform to perform a pouring task.

Abstract

Large-scale demonstration datasets underpin recent progress in Vision-Language-Action (VLA) models for robot manipulation. However, standardized data collection platforms such as DROID often rely on Cartesian-space controllers that lack the granularity needed for precise manipulation, leaving these tasks underrepresented in the resulting datasets. We introduce DIRECT (Durable, Intuitive, Repeatable, and Ergonomic Control Device for Robot Teleoperation), an active joint-space teleoperation device engineered for durability, repeatability, and ergonomics in continuous data collection. We integrate DIRECT with the DROID platform and demonstrate a proof-of-concept workflow in which we collect task demonstrations with DIRECT and fine-tune \(\boldsymbol{\pi_{0.5}}\) on three precise manipulation tasks, improving end-to-end success from 0% to 24–28% and task progression by 1.8–5×.

DIRECT portrait

Teleoperated robot using DIRECT. Left: DIRECT, right: DROID platform's Franka Emika Panda manipulator.

DIRECT Design

Building on prior work on puppet-style teleoperation devices, DIRECT prioritizes three principles for continuous data collection: mechanical durability, repeatability for consistent calibration across sessions, and ergonomics for operator comfort during extended use.

Durability via Servo Shields

Joint-space controllers mounting servos as load-bearing elements are highly vulnerable to failure. Relying solely on self-tapping screws driven into the plastic casings of the servo motors causes human-applied torque to eventually tear the screws out of the housing, requiring the whole motor to be replaced as a result. DIRECT resolves this via Servo Shields—enclosures that physically clamp the servo between solid barriers using supplementary mounting screws. By caging the actuator, structural loads are absorbed by the enclosure rather than the fragile internal threads, drastically improving mechanical durability. Any mechanical failures now occur exclusively on these inexpensive, easily reprinted shields.

Servo shield implementation.

Durability via Servo Shields. Left: FACTR relies solely on screws threaded directly into the plastic servo casing, which easily tear out under human-applied torque. Right: DIRECT introduces an enclosure (orange) that physically clamps the servo on multiple sides and utilizes additional mounting screws. This physical barrier transfers structural loads to the enclosure itself, protecting the fragile servo threads from pull-out.

Repeatability of Calibration

Prior puppet-style controllers rely on a visually confirmed resting pose for calibration, which can drift between sessions and cause unstable gravity compensation. DIRECT introduces physical calibration fixtures that lock the device joints into a mechanical zero. As a result, the physical joint configuration at startup matches the zero pose defined in the URDF, providing consistent initial conditions across episodes.

Calibration repeatability.

Repeatability of Calibration. Left: FACTR uses a visual resting pose for calibration. Right: DIRECT uses calibration fixtures (orange) that lock the joints into a mechanical zero, matching the URDF zero pose at every startup.

Ergonomics

Prior puppet-style controllers target seated operation, which limits operator comfort and reachable range during the standing workflow typical of DROID. DIRECT raises the base by 10 cm to extend the operator's reachable configuration space and introduces the Ergogrip, a contoured handle inspired by lever-action grips and GELLO, designed to improve ergonomics during extended standing sessions.

Ergonomics comparison.

Operator Ergonomics. FACTR (left) versus DIRECT (ours, right). Top right: the Ergogrip provides a contoured handle shaped for standing, single-handed operation. Bottom right: the DIRECT base is elevated 10 cm to extend the operator's reachable configuration space.

Printability and Sourcing

All parts are designed for 3D printing in PETG rather than PLA for improved toughness and temperature resistance, further supporting durability under continuous use. We provide pre-configured print files with correct part orientations. The design also uses standard screws included with the servos and an off-the-shelf metric ball bearing.

Software Architecture and Integration

To integrate DIRECT into data collection platforms without modifying their codebases, we adopt a decoupled two-process architecture. A standalone Python application runs a 500 Hz control loop, independently managing device kinematics and gravity compensation. A lightweight plugin runs alongside the platform control stack, translating device states into joint commands. Because data collection platforms periodically sever controller communication to reset trajectories, reconnecting an active loop risks abrupt torque spikes if the robot and device have drifted apart. We mitigate this with a de-sync state machine: each reconnection triggers a smooth proportional-derivative realignment before active control resumes, improving repeatability between episodes.

Force Feedback

Full bi-directional force feedback—as demonstrated by FACTR, where contact forces at the robot are rendered to the operator's hand—requires a tight control loop (hundreds of Hz) between the robot and the device. Routing this feedback through the DROID control stack (~50 Hz) introduces latency that destabilizes the loop. We therefore opted for an alternative virtual spring on the device trigger, which provides passive resistance for grip comfort. Extending DIRECT to full bi-directional feedback is left for future work.

Experiments

We identify three tasks that the \(\pi_{0.5}\) model trained on DROID consistently fails to complete end-to-end. These tasks require human-like affordances—precise, dexterous manipulation that is difficult to express through Cartesian-space teleoperation and is consequently underrepresented in DROID. Select a task to view its demonstration.

Pouring from bottle

"pour the bottle into the blue bowl"

Goal. Pour the entire bottle into a bowl.

Starting scene. A transparent plastic bottle partially filled with a liquid substitute (dyed polystyrene balls ~1 cm in diameter) is placed on a table alongside a blue ceramic bowl.

Recorded trajectory. The bottle is grasped from the side, preferably with its opening visible to the wrist camera, then carried without spilling above the bowl, where it is carefully rotated and poured.

Success criterion. The bottle must be grasped such that the robot does not pour its contents on itself (not from the top), and most of the liquid must end up in the bowl.

ModelRunsTP [95% CI]SR [95% CI]
\(\pi_{0.5}\)259.0% [3.6, 20.2]0.0% [0.0, 13.3]
\(\pi_{0.5}\texttt{+FT}\)2543.4% [27.6, 61.2]28.0% [14.3, 47.6]

Fine-tuning

We fine-tune \(\pi_{0.5}\) on the collected demonstrations to evaluate whether modern VLA policies can acquire DIRECT-specific control style targeting human-like affordances. We use 50 demonstrations each for Pour and Hang, and 100 for Scoop, as the latter's complexity necessitated more data. To mitigate catastrophic forgetting, each training batch consists of 20% task-specific demonstrations and 80% samples from the original DROID dataset. While the baseline fails to achieve any end-to-end success (0% across 25 runs per task), it reaches non-zero task progression (9.0–27.0%), suggesting it recognizes scene semantics but lacks the precision required by DIRECT's affordance constraints. In contrast, \(\pi_{0.5}\)\texttt{+FT} achieves success rates of 28.0%, 24.0%, and 28.0% with task progressions of 43.4%, 51.0%, and 37.6%. These consistent 1.8–5× gains demonstrate that \(\pi_{0.5}\) can effectively adapt to the DIRECT style from modest data, provided the base DROID distribution is preserved within each batch.

TP confidence intervals computed via BCa bootstrap (N=10,000); SR CIs via the Wilson score interval.

BibTeX

@misc{pilc2026direct,
      title={{DIRECT}: {Durable} {Intuitive} {Repeatable} {Ergonomic} {Controller} for {Robot} {Teleoperation}},
      author={Pilc, Simon and Drozdik, Patrik and Ponimatkin, Georgy and Sedlacek, Martin and Sivic, Josef and Petrik, Vladimir},
      year={2026},
}
    

Acknowledgements

This work was supported by the European Union's Horizon Europe projects AGIMUS (No. 101070165), euROBIN (No. 101070596), ERC FRONTIER (No. 101097822), and ELLIOT (No. 101214398). GP was also partly supported by the Grant Agency of the Czech Technical University in Prague under allocation SGS25/156/OHK3/3T/13. MS was also partly supported by the Grant Agency of the Czech Technical University in Prague under allocation SGS26/159/OHK3/3T/13. Compute resources and infrastructure were supported by the Ministry of Education, Youth and Sports of the Czech Republic through e-INFRA CZ (ID:90254) and by the European Union's Horizon Europe project CLARA (No. 101136607).