HandX: Scaling Bimanual Motion and Interaction Generation

1University of Illinois Urbana-Champaign 2Specs Inc. 3Snap Inc.
Equal Contribution Project Lead *Equal Advising
CVPR 2026

Abstract

TL;DR: A large-scale bimanual motion dataset with fine-grained annotations, enabling bimanual motion generation and dexterous control with clear scaling trends.

Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack high-fidelity bimanual sequences that capture nuanced finger dynamics and collaboration. To fill this gap, we collect a new dataset targeting these underrepresented aspects. For scalable annotation, we introduce a decoupled paradigm that extracts representative motion features, e.g., contact events and finger flexion, and then leverages reasoning from large language model to produce fine-grained, semantically rich descriptions aligned with these features. Building on the resulting data and annotations, we benchmark diffusion and autoregressive models with versatile conditioning modes. Experiments demonstrate high-quality dexterous motion generation, supported by our newly proposed hand-focused metrics. We further observe clear scaling trends: larger models trained on larger, higher-quality datasets produce more semantically coherent bimanual motions. All data will be released to support future research.

Dataset Statistics

Dataset Comparison

Left: Dataset scale and text supervision. HandX provides fine-grained, multi-level language descriptions. Right: Statistics of bimanual motion quality. HandX provides contact-rich bimanual motions.

Dataset Examples

Text to Motion Generation

Versatile Generation Tasks

Start

Result

End

Start

Result

End

Keyframe Input

Result

Keyframe Input

Result

The green section represents the first prompt's results, and the orange section represents the second prompt's results. The model ensures smooth transitions between segments.

Qualitative Evaluation on Scaling Law

Trained on 100% Dataset

Trained on 20% Dataset

Left: Thumb contacts index then pinky tip; fingers straighten and spread apart.
Right: Thumb contacts index tip; fingers extend from bent to straight.
Relation: Hands stay far apart; relative alignment changes dynamically.

Trained on 100% Dataset

Trained on 20% Dataset

Left: Two thumb-index contacts; ring and pinky curl inwards.
Right: Thumb-index contact, then fingers gradually extend straight.
Relation: Constant distant spatial relationship between hands.

Motion Capture

BibTeX

@article{zhang2026handx,
  title = {HandX: Scaling Bimanual Motion and Interaction Generation},
  author = {Zhang, Zimu and Zhang, Yucheng and Xu, Xiyan and Wang, Ziyin and Xu, Sirui and Zhou, Kai and Zhou, Bing and Guo, Chuan and Wang, Jian and Wang, Yu-Xiong and Gui, Liang-Yan},
  journal = {arXiv},
  year = {2026},
}