Best viewed on desktop. Dataset playback and 3D viewers require a screen 1024 px wide or larger.

We are building the largesthuman centric omni-modal datasetfor embodied ai

Available categories include Motion with Object & Vision (MOV), Omni-Modality (OM) and In-The-Wild (ITW)

To explore our ModalityNet datasets please register or login.

LOG IN/REGISTER
Human-Centered Hierarchical ModelingSub-millimeter Spatiotemporal SynchronizationMultimodal, High-Dimensional Signal RegistrationCross-Ontology Compatibility and Morphological MappingEcological Validity and Scene GeneralizationCapture of Unconstrained Behavioral DynamicsHigh-Throughput Distributed AcquisitionCross-Ontology Annotation and Compliance PipelineMultimodal Sensor Fusion and State EstimationMulti-Scale Hierarchical InterpretabilitySim-to-Real Alignment and Zero-Shot TransferCross-Ontology Compatibility and Morphological AbstractionHuman-Centered Hierarchical ModelingSub-millimeter Spatiotemporal SynchronizationMultimodal, High-Dimensional Signal RegistrationCross-Ontology Compatibility and Morphological MappingEcological Validity and Scene GeneralizationCapture of Unconstrained Behavioral DynamicsHigh-Throughput Distributed AcquisitionCross-Ontology Annotation and Compliance PipelineMultimodal Sensor Fusion and State EstimationMulti-Scale Hierarchical InterpretabilitySim-to-Real Alignment and Zero-Shot TransferCross-Ontology Compatibility and Morphological AbstractionHuman-Centered Hierarchical ModelingSub-millimeter Spatiotemporal SynchronizationMultimodal, High-Dimensional Signal RegistrationCross-Ontology Compatibility and Morphological MappingEcological Validity and Scene GeneralizationCapture of Unconstrained Behavioral DynamicsHigh-Throughput Distributed AcquisitionCross-Ontology Annotation and Compliance PipelineMultimodal Sensor Fusion and State EstimationMulti-Scale Hierarchical InterpretabilitySim-to-Real Alignment and Zero-Shot TransferCross-Ontology Compatibility and Morphological AbstractionHuman-Centered Hierarchical ModelingSub-millimeter Spatiotemporal SynchronizationMultimodal, High-Dimensional Signal RegistrationCross-Ontology Compatibility and Morphological MappingEcological Validity and Scene GeneralizationCapture of Unconstrained Behavioral DynamicsHigh-Throughput Distributed AcquisitionCross-Ontology Annotation and Compliance PipelineMultimodal Sensor Fusion and State EstimationMulti-Scale Hierarchical InterpretabilitySim-to-Real Alignment and Zero-Shot TransferCross-Ontology Compatibility and Morphological Abstraction
Data Overview

Data at a Glance

Overview of current year data production capacity at factory site A1 and A2. Factory site B1, C1 and D1 to come online in 2026 increasing capacity. Capacity is updated monthly.

HiPHI-MOV
High Precision Human Interaction
Motion with Object & Vision
0+ hrs
HiPHI-OM
High Precision Human Interaction
Omni-Modality
0+ hrs
ITW
In-the-wild
Continuous multi-environment capture
0+ hrs

OMNI-MODAL DATASETS BUILT FOR ROBOTICS

  • Multimodal Sensor Fusion and State Estimation
  • Multi-Scale Hierarchical Interpretability
  • Sim-to-Real Alignment and Zero-Shot Transfer
  • Cross-Ontology Compatibility and Morphological Abstraction
HIGH PRECISION HUMAN INTERACTION: MOTION WITH OBJECT & VISION (HIPHI-MOV)

Build whole-body intelligence with context—full-body motion aligned with video in large, unrestricted spaces.

The HiPHI-MOV Dataset is a human-centric, high-fidelity multimodal corpus specifically engineered for the development of robust locomotion and whole-body loco-manipulation policies. It includes full-body motion capture, tracking of interacting objects, egocentric RGB-D visual data, third-person RGB-D visual data. HiPHI-MOV provides a synchronized data stream that co-registers ground-truth, full-body kinematic trajectories captured via high-frequency motion capture with ego-centric and exo-centric visual observations. This structured hierarchy enables the modeling of complex robotic behaviors, ranging from low-level motor primitives (joint-space dynamics) to high-level environmental aff ordances (scene-contextual navigation).

HIGH PRECISION HUMAN INTERACTION: OMNI-MODALITY (HIPHI-OM)

Teach robot hands true dexterity—millimeter finger motion plus pressure, captured for industrial precision.

The HiPHI-OM Dataset is a human-centric, high-fidelity, omni-modal repository acquired within a highly instrumented laboratory environment. It includes fullbody and fine grained hand motion capture, hand level tactile sensing, precise tracking of interacting objects, egocentric RGB-D visual data, third person RGB-D visual data, audio and temperature measurements. By utilizing synchronized, high-precision sensor arrays, HiPHI-OM provides ground truth level data for anthropocentric modeling with minimal aleatoric uncertainty. The dataset is designed to be ontology-agnostic, allowing for the decoupling of raw sensor data from specific semantic frameworks to maximize cross-domain generalization and longitudinal utility. Morphologically, the dataset supports a hierarchical structure, encompassing both micro-level kinematic primitives and meso-level sequential task planning.

  • Human-Centered Hierarchical Modeling
  • Sub-millimeter Spatiotemporal Synchronization
  • Multimodal, High-Dimensional Signal Registration
  • Cross-Ontology Compatibility and Morphological Mapping
  • Ecological Validity and Scene Generalization
  • Capture of Unconstrained Behavioral Dynamics
  • High-Throughput Distributed Acquisition
  • Cross-Ontology Annotation and Compliance Pipeline
IN-THE-WILD (ITW)

Train for reality, not the lab—natural human behavior across diverse real-world environments.

The ITW Dataset constitutes a human-centric, life-scale, diverse, open-world, multimodal repository of stochastic real-world scenarios designed to advance humanoid robotics and embodied intelligence (EI). It includes sparse-body motion capture sensors and egocentric RGB-D visual data. Departing from traditional laboratory-constrained acquisition, ITW captures ecologically valid human behaviors and interaction dynamics within unconstrained environments. By integrating high-variance environmental noise and long-tail edge cases into the training distribution, ITW facilitates the generalization of laboratory-optimized algorithms toward industrial deployment. Whenintegrated with the HiPHI-OM high-precision dataset, it provides a comprehensive cross-domain corpus spanning diverse operational scenarios and sensory modalities.

Why Omni-Modal

Finger-level truth (not approximations)

Millimeter-scale hand/finger kinematics plus pressure/contact signals, so models can learn real grasp dynamics—not just pose trajectories.

Synchronized multi-modal ground truth

Motion + multi-view video (and additional signals where applicable) captured in time alignment, enabling strong visual grounding and cross-modal learning.

Coverage across the full realism spectrum

Controlled “factory-grade” precision, large-space motion-with-vision, and truly natural “in-the-wild” behavior—so training data spans clean labels and messy real-world variance.

Built for scale, consistency, and deployment

Repeatable acquisition pipelines, standardized calibration/QA, and dataset structure designed for model training workflows—so you get reliable data, not one-off demos.

Trusted By Pioneers in Robotics and AI

From academic institutions to global fortune 500 companies
our data and acquisition pipelines support the current and future development
of humanoid robotics embodied ai.

NVIDIA
Google DeepMind
Stanford
UC Berkeley
UC San Diego
ByteDance
Tencent Robotics X
XPENG
Agibot
Fourier
Galbot
LimX Dynamics
PNDbotics
LEJUROBOT
Chinese University of Hong Kong
Hong Kong UST
University of Hong Kong
Foundation
TARS

Want to partner with us?

We collaborate with teams pushing the edge of embodied AI.

To explore our ModalityNet datasets please register or login.

LOG IN/REGISTER