Best viewed on desktop. Dataset playback and 3D viewers require a screen 1024 px wide or larger.
The HiPHI-MOV Dataset is a human-centric, high-fidelity multimodal corpus specifically engineered for the development of robust locomotion and whole-body loco-manipulation policies. It includes full-body motion capture, tracking of interacting objects, egocentric RGB-D visual data, third-person RGB-D visual data. HiPHI-MOV provides a synchronized data stream that co-registers ground-truth, full-body kinematic trajectories captured via high-frequency motion capture with ego-centric and exo-centric visual observations. This structured hierarchy enables the modeling of complex robotic behaviors, ranging from low-level motor primitives (joint-space dynamics) to high-level environmental aff ordances (scene-contextual navigation).
The HiPHI-OM Dataset is a human-centric, high-fidelity, omni-modal repository acquired within a highly instrumented laboratory environment. It includes fullbody and fine grained hand motion capture, hand level tactile sensing, precise tracking of interacting objects, egocentric RGB-D visual data, third person RGB-D visual data, audio and temperature measurements. By utilizing synchronized, high-precision sensor arrays, HiPHI-OM provides ground truth level data for anthropocentric modeling with minimal aleatoric uncertainty. The dataset is designed to be ontology-agnostic, allowing for the decoupling of raw sensor data from specific semantic frameworks to maximize cross-domain generalization and longitudinal utility. Morphologically, the dataset supports a hierarchical structure, encompassing both micro-level kinematic primitives and meso-level sequential task planning.
The ITW Dataset constitutes a human-centric, life-scale, diverse, open-world, multimodal repository of stochastic real-world scenarios designed to advance humanoid robotics and embodied intelligence (EI). It includes sparse-body motion capture sensors and egocentric RGB-D visual data. Departing from traditional laboratory-constrained acquisition, ITW captures ecologically valid human behaviors and interaction dynamics within unconstrained environments. By integrating high-variance environmental noise and long-tail edge cases into the training distribution, ITW facilitates the generalization of laboratory-optimized algorithms toward industrial deployment. Whenintegrated with the HiPHI-OM high-precision dataset, it provides a comprehensive cross-domain corpus spanning diverse operational scenarios and sensory modalities.
Finger-level truth (not approximations)
Millimeter-scale hand/finger kinematics plus pressure/contact signals, so models can learn real grasp dynamics—not just pose trajectories.
Synchronized multi-modal ground truth
Motion + multi-view video (and additional signals where applicable) captured in time alignment, enabling strong visual grounding and cross-modal learning.
Coverage across the full realism spectrum
Controlled “factory-grade” precision, large-space motion-with-vision, and truly natural “in-the-wild” behavior—so training data spans clean labels and messy real-world variance.
Built for scale, consistency, and deployment
Repeatable acquisition pipelines, standardized calibration/QA, and dataset structure designed for model training workflows—so you get reliable data, not one-off demos.
We collaborate with teams pushing the edge of embodied AI.
To explore our ModalityNet datasets please register or login.