Modalitynet Technical Specification v1.0

Introduction to the Three Datasets

version date
0.51 20260123
v1.0 20260511

Disclaimer

This document contains proprietary and confidential information of Noitom Robotics and is intended for authorized academic and/or technical review only. No part of this document may be reproduced, disclosed, distributed, or otherwise made available to any third party, in whole or in part, without the prior written consent of Noitom Robotics. Any unauthorized use of this document is strictly prohibited.

Feature HiPHI Motion with Object & Vision (HiPHI-MOV) HiPHI Omni-Modality (HiPHI-OM) In-The-Wild (ITW)
Primary Goal Locomotion-Manipulation Ground-truth Precision Real-world Robustness
Environment Transitionary / Complex Controlled Lab Unconstrained / Unstructured/Stochastic
Key Sensor Synchronized RGB-D + MoCap Optical + IMU Fusion Portable/Sparse Arrays
Action Scale Hierarchical (Micro to Macro) Atomic & Meso-level Ecological/Naturalistic
Ontology Morphological Abstraction Cross-Ontology Mapping Cross-Ontology Annotation

Overview

High Precision Human Interaction Motion With Object & Vision (HiPHI-MOV) Dataset: The HiPHI-MOV Dataset is a human-centric, high-fidelity multimodal corpus specifically engineered for the development of robust locomotion and whole-body loco-manipulation policies. It includes full-body motion capture, tracking of interacting objects, third-person RGB-D visual data. HiPHI-MOV provides a synchronized data stream that co-registers ground-truth, full-body kinematic trajectories—captured via high-frequency motion capture—with ego-centric and exo-centric visual observations. This structured hierarchy enables the modeling of complex robotic behaviors, ranging from low-level motor primitives (joint-space dynamics) to high-level environmental affordances (scene-contextual navigation).

High Precision Human Interaction Omni-Modality (HiPHI-OM) Dataset : The HiPHI-OM Dataset is a human-centric, high-fidelity, omni-modal repository acquired within a highly instrumented laboratory environment. It includes full-body and fine-grained hand motion capture, hand-level tactile sensing, precise tracking of interacting objects, egocentric RGB-D visual data, and third-person RGB-D visual data. By utilizing synchronized, high-precision sensor arrays, HiPHI-OM provides ground-truth level data for anthropocentric modeling with minimal aleatoric uncertainty. The dataset is designed to be ontology-agnostic, allowing for the decoupling of raw sensor data from specific semantic frameworks to maximize cross-domain generalization and longitudinal utility. Morphologically, the dataset supports a hierarchical structure, encompassing both micro-level kinematic primitives and meso-level sequential task planning.

In the Wild (ITW) Dataset : The In-the-Wild (ITW) Dataset constitutes a human-centric, life-scale, diverse, open-world, multimodal repository of stochastic real-world scenarios designed to advance humanoid robotics and embodied intelligence (EI). It includes sparse-body motion capture sensors and egocentric RGB-D visual data. Departing from traditional laboratory-constrained acquisition, ITW captures ecologically valid human behaviors and interaction dynamics within unconstrained environments. By integrating high-variance environmental noise and long-tail edge cases into the training distribution, ITW facilitates the generalization of laboratory-optimized algorithms toward industrial deployment. When integrated with the HiPHI-OM dataset, it provides a comprehensive cross-domain corpus spanning diverse operational scenarios and sensory modalities.

1. Technical Specification: HiPHI Motion with Vision (HiPHI-MOV) Dataset

  • Multimodal Sensor Fusion and State Estimation: The HiPHI-MOV dataset provides a high-dimensional data stream that integrates 6-DOF full-body kinematics, high-fidelity hand-joint trajectories (provisioned), and synchronized RGB-D environmental telemetry. This deep fusion of proprioceptive and exteroceptive data supports the development of integrated perception–action loops, enabling models to learn the spatial relationships between body pose and environmental affordances.

  • Multi-Scale Hierarchical Interpretability: Designed as a structured benchmark for embodied intelligence, the dataset spans three distinct semantic layers: atomic-level action ontologies (kinematic primitives), mesoscopic task planning (sequential logic), and macro-level scene distributions (environmental context). This hierarchy allows for scientific interpretability in model performance, isolating whether failures occur at the motor-control, tactical, or strategic level.

  • Sim-to-Real Alignment and Zero-Shot Transfer: The dataset is engineered to facilitate sim-to-real alignment, providing the high-precision ground truth necessary to bridge the gap between virtual simulations and physical deployment. By capturing a diverse range of locomotion-manipulation tasks, HiPHI-MOV supports the evaluation of zero-shot transfer capabilities, allowing autonomous control algorithms to generalize to novel environments without additional fine-tuning.

  • Cross-Ontology Compatibility and Morphological Abstraction: Leveraging advanced motion retargeting frameworks, the HiPHI-MOV dataset decouples captured human motion from specific hardware constraints. This ensures cross-ontology compatibility, where high-fidelity motion data can be seamlessly mapped onto humanoid platforms with disparate degrees of freedom (DoF) and varied mechanical configurations. This abstraction is vital for creating platform-independent foundation models for robotic locomotion.

File Structure

Dataset Folder

File Name Description Motion without Object Motion with Object
motion_actor.bvh Human motion data
task_info.json Task information for this collection
config.json Relationship between motion bvh and object
prop_Object.csv Object motion information

*Video data will be provided soon.

Object Model Folder (only available for motion data with object)

File Name Description
Object.obj Object model (with orientation Y up)
Object.csv Object weight(kg) info

2. Technical Specification: HiPHI Omni-Modality (HiPHI-OM) Dataset

  • Human-Centered Hierarchical Modeling: The dataset captures anthropocentric behaviors across multiple granularities, from atomic-level meta-actions to medium- and long-range task sequences. By maintaining an ontology-agnostic underlying structure, HiPHI-OM facilitates high-level cross-ontology generalization, allowing the same raw behavioral data to be effectively mapped to diverse semantic frameworks and research objectives.

  • Sub-millimeter Spatiotemporal Synchronization: The system achieves high-precision pose capture through a robust human–computer interaction (HRI) pipeline. By fusing optical markers with inertial measurement units (IMUs), the infrastructure ensures stable and accurate tracking during high-speed, high-acceleration motions. This hybrid approach minimizes occlusion artifacts and latency, providing high-fidelity recordings of complex human kinematics.

  • Multimodal, High-Dimensional Signal Registration: HiPHI-OM serves as a "ground-truth-level" repository, providing synchronized signals across visual, tactile, and spatial domains. While current releases focus on high-precision target positioning and tactile feedback, the architecture is designed for "full-modal" expansion, with integrated force, audio, and thermal telemetry scheduled for subsequent release cycles within the controlled environment.

  • Cross-Ontology Compatibility and Morphological Mapping: A core strength of the dataset is its hardware-agnostic nature, achieved through advanced motion retargeting technology. This allows for the seamless translation of human motion data to humanoid robots with heterogeneous degrees of freedom (DoF) and varying physical proportions. By decoupling the data from specific robotic platforms, HiPHI-OM ensures that the learned policies are robust across a wide spectrum of robotic morphologies.

File Structure

Data Structure

File Name Description
config.json Metadata and description of the data in this collection
task_info.json Task information for this collection
camera_params/ Intrinsic and extrinsic parameters for all cameras
head_stereo_depth.csv Index, timestamp, and PNG path for depth images from the head-mounted depth camera
head_stereo_depth/ Depth maps from the head-mounted depth camera
head_stereo.csv Per-frame timestamps for the video from the head-mounted depth camera
head_stereo.mp4 Video from the head-mounted depth camera
head_wide.csv Per-frame timestamps for the video from the head-mounted wide-angle camera
head_wide.mp4 Video from the head-mounted wide-angle camera
fixed1_stereo_depth.csv Index, timestamp, and PNG path for depth images from fixed camera 1
fixed1_stereo_depth/ Depth maps from fixed camera 1
fixed1_stereo.csv Per-frame timestamps for the video from fixed camera 1
fixed1_stereo.mp4 Video from fixed camera 1
fixed1_wide.csv Per-frame timestamps for the wide-angle video from fixed camera 1
fixed1_wide.mp4 Wide-angle video from fixed camera 1
fixed2_stereo_depth.csv Index, timestamp, and PNG path for depth images from fixed camera 2
fixed2_stereo_depth/ Depth maps from fixed camera 2
fixed2_stereo.csv Per-frame timestamps for the video from fixed camera 2
fixed2_stereo.mp4 Video from fixed camera 2
fixed2_wide.csv Per-frame timestamps for the wide-angle video from fixed camera 2
fixed2_wide.mp4 Wide-angle video from fixed camera 2
hand_pressure_data.h5 6DOF data for the motion-capture subject’s skeleton
tracker_sixdof_data.h5 Palm pressure data (all finger joints and palm regions: upper, mid, lower-mid, lower, base)
human_bones.h5 6DOF data for full-body trackers, hands, and props

Obj Model Folder

File Name Description
Obj.fbx A versatile file format developed by Autodesk for 3D animation, modeling, and design. Unlike STL, FBX files can contain not only the geometry of a 3D model, but also its textures, animation data, and more.
Obj.stl A widely used file format for 3D printing. It represents the surface geometry of a 3D object using a series of connected triangles, making it a simple and efficient format for 3D printing.

Data formats for various file types

Video data

Each scene contains a total of six video files from three channels: one head-mounted channel and two fixed camera channels. Each channel includes two videos: one binocular and one wide-angle. The video data is an mp4 file, accompanied by a csv file with the same name, which records the timestamp of each frame. For example, the head_stereo.csv file records the timestamp of each frame of the head_stereo.mp4 video.

Depth data

Each scene contains three channels of depth data: one head-mounted channel and two fixed camera channels. The depth data of each depth camera is uniformly placed in a folder, where each png file corresponds to one frame of depth data. The csv file with the same name records the timestamp of each frame of data. For example, the head_stereo_depth.csv file records the timestamp of each frame in the head_stereo_depth/ depth directory. The depth data of each png is 16-bit, and the following parsing script can be used to obtain one frame of data.

Parse code: read_png_16bit.py

Output result example:

output
1>python read_png_16bit.py dataset\3\3_1_1760508893\depth_fixed\depth_0_1760508893407.png
2Image data type: uint16
3Image shape (height, width): (1280, 720)
4Pixel value range: [0, 4999]
5Pixel values:(0, 0): 2229 (0, 1): 895 (0, 2): 2995 (0, 3): 4374 (0, 4): 3547 (0, 5): 1692 (0, 6): 4714 (0, 7): 1647 (0, 8): 3925 (0, 9): 3390 (0, 10): 2282 (0, 11): 862 (0, 12): 2801 (0, 13): 1817 (0, 14): 3244 (0, 15): 1869 (0, 16): 1273 (0, 17): 1041 (0, 18): 2761 (0, 19): 3518 (0, 20): 2127 (0, 21): 3061 (0, 22): 1924 (0, 23): 3374 (0, 24): 908 (0, 25): 3501 (0, 26): 1822 (0, 27): 3944 (0, 28): 252 (0, 29): 2680 (0, 30): 1078 (0, 31): 4535 (0, 32): 356 (0, 33): 2394 (0, 34): 3 (0, 35): 827 (0, 36): 3834 (0, 37): 4101 (0, 38): 2683 (0, 39): 1128 (0, 40): 2544 (0, 41): 2289 (0, 42): 58 (0, 43): 2335 (0, 44): 3181 (0, 45): 1335 (0, 46): 4882 (0, 47): 4324 (0, 48): 795 (0, 49): 4056 (0, 50): 1729 (0, 51): 1073 (0, 52): 2216 (0, 53): 3168 (0, 54): 719 (0, 55): 693 (0, 56): 3484 (0, 57): 137 (0, 58): 3165 (0, 59): 2427 (0, 60): 3391 (0, 61): 1962 (0, 62): 2656 (0, 63): 3696 (0, 64): 4627 (0, 65): 1604 (0, 66): 4554 (0, 67): 615 (0, 68): 4258 (0, 69): 4757 (0, 70): 343 (0, 71): 202 (0, 72): 2056 (0, 73): 874 (0, 74): 1838 (0, 75): 742 (0, 76): 880 (0, 77): 1573 (0, 78): 3504 (0, 79): 4451 (0, 80): 2053 (0, 81): 667 (0, 82): 4895 (0, 83): 861 (0, 84): 1448 (0, 85): 2262 (0, 86): 80 (0, 87): 1445 (0, 88): 3191 (0, 89): 3864 (0, 90): 2022 (0, 91): 4655 (0, 92): 266 (0, 93): 260 (0, 94): 2292 (0, 95): 2861 (0, 96): 248 (0, 97): 671 (0, 98): 3239 (0, 99): 3710 (0, 100): 3766 (0, 101): 1283 (0, 102): 2494 (0, 103): 2164 (0, 104): 4340 (0, 105): 3539 (0, 106): 1558 (0, 107): 619 (0, 108): 4826 (0, 109): 1730 (0, 110): 2195 (0, 111): 3813 (0, 112): 2310 (0, 113): 1343 (0, 114): 2980 (0, 115): 3945 (0, 116): 315 (0, 117): 4461 (0, 118): 1315 (0, 119): 3767 (0, 120): 1854 (0, 121): 957 (0, 122): 2968 (0, 123): 3151 (0, 124): 1445 (0, 125): 3355 (0, 126): 3410 (0, 127): 860 (0, 128): 2301 (0, 129): 2527 (0, 130): 2324 (0, 131): 3310 (0, 132): 276 (0, 133): 3899 (0, 134): 102 (0, 135): 2384 (0, 136): 2996 (0, 137): 109 (0, 138): 922 (0, 139): 4917 (0, 140): 2406 (0, 141): 619 (0, 142): 307 (0, 143): 2187 (0, 144): 2679 (0, 145): 3516 (0, 146): 2818 (0, 147): 535 (0, 148): 1242 (0, 149): 1102 (0, 150): 3657 (0, 151): 3104 (0, 152): 807 (0, 153): 3926 (0, 154): 3332 (0, 155): 3453 (0, 156): 2338 (0, 157): 250 (0, 158): 3388 (0, 159): 4432 (0, 160): 2745 (0, 161): 538 (0, 162): 2648 (0, 163): 4757 (0, 164): 1002 (0, 165): 4200 (0, 166): 1126 (0, 167): 3228 (0, 168): 4195 (0, 169): 1135 (0, 170): 4117 (0, 171): 789 (0, 172): 3131 (0, 173): 1786 (0, 174): 4705 (0, 175): 2263 (0, 176): 3551 (0, 177): 2455 (0, 178): 1543 (0, 179): 1735 (0, 180): 4995 (0, 181): 886 (0, 182): 3535 (0, 183): 3820 (0, 184): 4037 (0, 185): 3589 (0, 186): 1743 (0, 187): 316 (0, 188): 2223 (0, 189): 2552 (0, 190): 2763 (0, 191): 3179 (0, 192): 4976 (0, 193): 2888 (0, 194): 3415 (0, 195): 3515 (0, 196): 4460 (0, 197): 2020 (0, 198): 4898 (0, 199): 4138 (0, 200): 3994 (0, 201): 3146 (0, 202): 1844 (0, 203): 2860 (0, 204): 4602 (0, 205): 3212 (0, 206): 3750 (0, 207): 3079 (0, 208): 359 (0, 209): 4843 (0, 210): 3290 (0, 211): 718 (0, 212): 1020 (0, 213): 2644 (0, 214): 1384 (0, 215): 4617 (0, 216): 2844 (0, 217): 4825 (0, 218): 4928 (0, 219): 1177 (0, 220): 4585 (0, 221): 3034 (0, 222): 2382 (0, 223): 1233 (0, 224): 2610 (0, 225): 1418 (0, 226): 3538 (0, 227): 2643 (0, 228): 1012 (0, 229): 925 (0, 230): 3815 (0, 231): 1852 (0, 232): 2971 (0, 233): 496 (0, 234): 4573 (0, 235): 3874 (0, 236): 3522 (0, 237): 3187 (0, 238): 2196 (0, 239): 3725 (0, 240): 3469 (0, 241): 1070 (0, 242): 2604 (0, 243): 1639 (0, 244): 4423 (0, 245): 2680 (0, 246): 327 (0, 247): 3259 (0, 248): 1698 (0, 249): 251 (0, 250): 1238 (0, 251): 4077 (0, 252): 2870 (0, 253): 2897 (0, 254): 2452 (0, 255): 2858 (0, 256): 2765 (0, 257): 297 (0, 258): 3220 (0, 259): 3014 (0, 260): 3422 (0, 261): 1762 (0, 262): 2345 (0, 263): 3654 (0, 264): 261 (0, 265): 1800 (0, 266): 1239 (0, 267): 3758 (0, 268): 309 (0, 269): 568 (0, 270): 2154 (0, 271): 1835 (0, 272): 1193 (0, 273): 2603 (0, 274): 3344 (0, 275): 607 (0, 276): 751 (0, 277): 465 (0, 278): 3444 (0, 279): 1199 (0, 280): 1010 (0, 281): 4014 (0, 282): 658 (0, 283): 3120 (0, 284): 689 (0, 285): 2118 (0, 286): 503 (0, 287): 124 (0, 288): 4102 (0, 289): 842 (0, 290): 3979 (0, 291): 460 (0, 292): 160 (0, 293): 4660 (0, 294): 3781 (0, 295): 2831 (0, 296): 4011 (0, 297): 944 (0, 298): 1318 (0, 299): 4858 (0, 300): 3669 (0, 301): 932 (0, 302): 4000 (0, 303): 2817 (0, 304): 2516 (0, 305): 727 (0, 306): 530 (0, 307): 3398 (0, 308): 2861 (0, 309): 3774 (0, 310): 2900 (0, 311): 3533 (0, 312): 1493 (0, 313): 3201 (0, 314): 3312 (0, 315): 4431 (0, 316): 223 (0, 317): 2022 (0, 318): 2874 (0, 319): 910 (0, 320): 4824 (0, 321): 246 (0, 322): 4623 (0, 323): 3496 (0, 324): 463 (0, 325): 3367 (0, 326): 4978 (0, 327): 2157 (0, 328): 2640 (0, 329): 2327 (0, 330): 860 (0, 331): 4609 (0, 332): 2405 (0, 333): 2624 (0, 334): 192 (0, 335): 3151 (0, 336): 3184 (0, 337): 1699 (0, 338): 3350 (0, 339): 690 (0, 340): 3819 (0, 341): 3446 (0, 342): 2070 (0, 343): 697 (0, 344): 1447 (0, 345): 2494 (0, 346): 1968 (0, 347): 2823 (0, 348): 3012 (0, 349): 36 (0, 350): 2428 (0, 351): 3593 (0, 352): 4921 (0, 353): 1773 (0, 354): 585 (0, 355): 4115 (0, 356): 4439 (0, 357): 1189 (0, 358): 2920 (0, 359): 4544 (0, 360): 3181 (0, 361): 3115 (0, 362): 3071 (0, 363): 2899 (0, 364): 824 (0, 365): 4391 (0, 366): 1810 (0, 367): 1204 (0, 368): 2175 (0, 369): 1228 (0, 370): 4392 (0, 371): 1432 (0, 372): 3680 (0, 373): 2839 (0, 374): 1143 (0, 375): 4809 (0, 376): 4825 (0, 377): 2654 (0, 378): 2897 (0, 379): 726 (0, 380): 4421 (0, 381): 3494 (0, 382): 1256 (0, 383): 1552 (0, 384): 2376 (0, 385): 2855 (0, 386): 3714 (0, 387): 223 (0, 388): 1125 (0, 389): 813 (0, 390): 299 (0, 391): 3849 (0, 392): 3600 (0, 393): 2389 (0, 394): 4787 (0, 395): 1902 (0, 396): 4027 (0, 397): 3895 (0, 398): 3006 (0, 399): 2835 (0, 400): 722 (0, 401): 1200 (0, 402): 3251 (0, 403): 4236 (0, 404): 4493 (0, 405): 3922 (0, 406): 3248 (0, 407): 2911 (0, 408): 1439 (0, 409): 2746 (0, 410): 4049 (0, 411): 1887 (0, 412): 547 (0, 413): 2640 (0, 414): 2895 (0, 415): 2927 (0, 416): 705 (0, 417): 4506 (0, 418): 3382 (0, 419): 4055 (0, 420): 2464 (0, 421): 3003 (0, 422): 219 (0, 423): 3077 (0, 424): 1888 (0, 425): 1452 (0, 426): 2162 (0, 427): 4468 (0, 428): 190 (0, 429): 4557 (0, 430): 570 (0, 431): 4314 (0, 432): 4713 (0, 433): 2175 (0, 434): 8 (0, 435): 1294 (0, 436): 727 (0, 437): 1036 (0, 438): 2785 (0, 439): 1803 (0, 440): 1812 (0, 441): 3593 (0, 442): 446 (0, 443): 4430 (0, 444): 3949 (0, 445): 3296 (0, 446): 1341 (0, 447): 2179 (0, 448): 2436 (0, 449): 3399 (0, 450): 4999 (0, 451): 1526 (0, 452): 3562 (0, 453): 4067 (0, 454): 4304 (0, 455): 4841 (0, 456): 3366 (0, 457): 182 (0, 458): 1414 (0, 459): 4010 (0, 460): 2715 (0, 461): 2866 (0, 462): 1879 (0, 463): 4512 (0, 464): 742 (0, 465): 4167 (0, 466): 2028 (0, 467): 882 (0, 468): 1689 (0, 469): 962 (0, 470): 4490 (0, 471): 4545 (0, 472): 3517 (0, 473): 4138 (0, 474): 4169 (0, 475): 1454 (0, 476): 546 (0, 477): 850 (0, 478): 3459 (0, 479): 927 (0, 480): 3729 (0, 481): 123 (0, 482): 1422 (0, 483): 3038 (0, 484): 2690 (0, 485): 4690 (0, 486): 4424 (0, 487): 477 (0, 488): 1018 (0, 489): 2741 (0, 490): 1192 (0, 491): 2116 (0, 492): 769 (0, 493): 1207 (0, 494): 4340 (0, 495): 4091 (0, 496): 164 (0, 497): 3710 (0, 498): 1920 (0, 499): 4843 (0, 500): 3379 (0, 501): 2960 (0, 502): 3162 (0, 503): 4266 (0, 504): 3305 (0, 505): 935 (0, 506): 1676 (0, 507): 2800 (0, 508): 4173 (0, 509): 3277 (0, 510): 35 (0, 511): 3802 (0, 512): 4073 (0, 513): 1402 (0, 514): 3165 (0, 515): 1654 (0, 516): 2070 (0, 517): 4510 (0, 518): 1630 (0, 519): 1641 (0, 520): 2074 (0, 521): 1814 (0, 522): 757 (0, 523): 352 (0, 524): 1806 (0, 525): 3036 (0, 526): 2763 (0, 527): 2077 (0, 528): 1184 (0, 529): 3359 (0, 530): 3640 (0, 531): 2566 (0, 532): 4671 (0, 533): 2531 (0, 534): 1781 (0, 535): 3011 (0, 536): 2608 (0, 537): 2305 (0, 538): 2891 (0, 539): 2155 (0, 540): 4408 (0, 541): 1845 (0, 542): 1001 (0, 543): 2443 (0, 544): 2630 (0, 545): 2735 (0, 546): 1728 (0, 547): 4914 (0, 548): 3458 (0, 549): 2185 (0, 550): 4457 (0, 551): 2353 (0, 552): 4659 (0, 553): 2233 (0, 554): 3447 (0, 555): 2552 (0, 556): 2566 (0, 557): 1079 (0, 558): 2384 (0, 559): 1498 (0, 560): 2127 (0, 561): 4214 (0, 562): 4288 (0, 563): 220 (0, 564): 2664 (0, 565): 4102 (0, 566): 849 (0, 567): 87 (0, 568): 4278 (0, 569): 1012 (0, 570): 4604 (0, 571): 267 (0, 572): 1706 (0, 573): 4179 (0, 574): 3289 (0, 575): 1064 (0, 576): 76 (0, 577): 1531 (0, 578): 4776 (0, 579): 225 (0, 580): 4344 (0, 581): 362 (0, 582): 2157 (0, 583): 4017 (0, 584): 312 (0, 585): 2540 (0, 586): 918 (0, 587): 1094 (0, 588): 4009 (0, 589): 1341 (0, 590): 3738 (0, 591): 4509 (0, 592): 2958 (0, 593): 1906 (0, 594): 4452 (0, 595): 1296 (0, 596): 2124 (0, 597): 2871 (0, 598): 13 (0, 599): 2384 (0, 600): 3010 (0, 601): 1695 (0, 602): 3492 (0, 603): 4401 (0, 604): 1145 (0, 605): 4864 (0, 606): 3383 (0, 607): 1380 (0, 608): 4914 (0, 609): 3132 (0, 610): 4370 (0, 611): 3797 (0, 612): 2368 (0, 613): 4954 (0, 614): 2765 (0, 615): 2994 (0, 616): 1732 (0, 617): 1917 (0, 618): 1338 (0, 619): 2086 (0, 620): 464 (0, 621): 3836 (0, 622): 335 (0, 623): 1885 (0, 624): 2708 (0, 625): 2188 (0, 626): 2631 (0, 627): 1798 (0, 628): 1911 (0, 629): 548 (0, 630): 3335 (0, 631): 1598 (0, 632): 1083 (0, 633): 895 (0, 634): 1474 (0, 635): 1671 (0, 636): 4823 (0, 637): 4373 (0, 638): 1128 (0, 639): 1299......

H5 File

All structured data other than video data and depth data is stored in H5 format.

Each H5 file contains a dataset named data, which consists of multiple records, each corresponding to a single frame. Every record includes three fields: index (an np.int64 indicating the frame number, starting from 0 and increasing sequentially), timestamp (an np.float64 representing the time, where the integer part denotes seconds and the three decimal places indicate milliseconds), and elements, whose structure varies depending on the data type and is described in detail below.

Position and attitude data of the tracker

In each scene, there are trackers used to track the sub-millimeter level pose information (6dof) of some key objects, as shown in the following table:

Name Meaning
fixed1_cam Fixed Camera 1
fixed2_cam Fixed Camera 2
Head Header
Spine Back
Hips Hip
RightUpLeg Right thigh
RightFoot Right Foot
LeftUpLeg Left thigh
LeftFoot Left Foot
RightHand Back of right hand
RightHandThumb2 Right thumb tip
RightHandThumb1 Right thumb base
RightHandIndex2 Right index finger tip
RightHandIndex1 Right index finger root
RightHandMiddle2 Right middle fingertip
RightHandMiddle1 Root of the right middle finger
RightHandRing2 Tip of the right ring finger
RightHandRing1 Base of the right ring finger
RightHandPinky2 Right little finger tip
RightHandPinky1 Base of the right little finger
LeftHand Back of left hand
LeftHandThumb2 Left thumb tip
LeftHandThumb1 Left thumb base
LeftHandIndex2 Left index finger tip
LeftHandIndex1 Left index finger root
LeftHandMiddle2 Left middle finger tip
LeftHandMiddle1 Root of the left middle finger
LeftHandRing2 Left ring finger tip
LeftHandRing1 Base of the left ring finger
LeftHandPinky2 Left little finger tip
LeftHandPinky1 Base of the left little finger
TBD Other Props

These datas are stored in the tracker_sixdof.h5 file, with sub-millimeter positional accuracy

Parse code: read_sixdof_data_h5_1.py

Output result example:

output
1>python read_sixdof_data_h5.py dataset\3\3_2_1760512240\tracker_sixdof_data.h5
2Successfully opened file: dataset\3\3_2_1760512240\tracker_sixdof_data.h5
3File contains 102 frames of data
4[0] 6DOF, 0, 1760512240.635: Head(pos: [4.700, 9.741, 0.208], rot: [6.931, 2.250, 8.570, 7.848]), Spine2(pos: [9.661, 1.140, 5.715], rot: [1.364, 2.656, 6.938, 9.754]), LeftArm(pos: [6.545, 3.506, 4.913], rot: [6.109, 4.777, 8.283, 7.738]), LeftForeArm(pos: [8.475, 4.751, 4.370], rot: [7.183, 1.154, 1.972, 3.395]), RightArm(pos: [6.025, 1.487, 5.187], rot: [5.980, 0.864, 3.507, 7.056]), RightForeArm(pos: [8.127, 6.362, 0.697], rot: [7.863, 4.106, 2.043, 4.971]), RightHand(pos: [5.904, 8.878, 7.089], rot: [0.521, 6.584, 7.460, 4.837]), RightHandThumb2(pos: [8.239, 4.211, 4.194], rot: [3.032, 3.650, 3.832, 2.387]), RightHandThumb1(pos: [3.484, 5.090, 9.470], rot: [6.862, 9.581, 1.757, 1.394]), RightHandIndex2(pos: [6.113, 2.014, 9.816], rot: [4.125, 1.529, 4.666, 8.462]), RightHandIndex1(pos: [1.576, 6.268, 0.611], rot: [2.259, 4.032, 7.302, 7.763]), RightHandMiddle2(pos: [6.777, 4.669, 2.550], rot: [5.158, 8.411, 9.693, 3.566]), RightHandMiddle1(pos: [8.038, 1.110, 9.366], rot: [9.309, 8.405, 1.692, 2.564]), RightHandRing2(pos: [2.042, 3.063, 2.148], rot: [1.387, 7.023, 6.425, 4.465]), RightHandRing1(pos: [9.300, 7.093, 1.573], rot: [4.383, 3.871, 8.150, 6.244]), RightHandPinky2(pos: [5.530, 6.196, 2.384], rot: [3.299, 4.086, 1.726, 2.509]), RightHandPinky1(pos: [2.016, 7.864, 3.604], rot: [3.591, 4.292, 5.464, 8.199]), LeftHand(pos: [1.252, 2.116, 0.793], rot: [5.515, 5.732, 7.438, 6.570]), LeftHandThumb2(pos: [7.785, 7.563, 8.329], rot: [9.511, 3.618, 7.574, 8.522]), LeftHandThumb1(pos: [6.129, 5.274, 2.667], rot: [1.951, 5.426, 4.259, 8.812]), LeftHandIndex2(pos: [0.070, 0.762, 6.821], rot: [5.819, 2.508, 5.206, 3.099]), LeftHandIndex1(pos: [9.334, 2.247, 4.293], rot: [4.109, 4.035, 2.718, 5.930]), LeftHandMiddle2(pos: [4.786, 5.211, 0.040], rot: [5.388, 4.672, 1.993, 9.700]), LeftHandMiddle1(pos: [9.905, 9.530, 3.052], rot: [8.022, 7.669, 7.746, 1.762]), LeftHandRing2(pos: [4.389, 7.090, 3.820], rot: [9.966, 1.297, 9.525, 9.557]), LeftHandRing1(pos: [3.045, 9.463, 0.549], rot: [3.151, 5.749, 4.670, 0.488]), LeftHandPinky2(pos: [3.227, 0.974, 7.268], rot: [3.382, 7.960, 4.778, 5.802]), LeftHandPinky1(pos: [5.065, 3.246, 0.746], rot: [7.747, 3.241, 0.531, 1.255]), bosch(pos: [6.309, 5.763, 1.114], rot: [4.260, 0.026, 2.476, 4.636]), plug_in(pos: [4.470, 0.403, 3.455], rot: [1.058, 1.706, 9.969, 4.323]), plug(pos: [4.142, 6.197, 0.687], rot: [9.169, 7.165, 4.744, 5.936]), mouse(pos: [9.885, 8.997, 8.870], rot: [3.320, 8.982, 6.344, 3.425]), bottle_cap(pos: [6.199, 2.856, 4.770], rot: [2.192, 5.186, 2.645, 6.752])......
5[101] 6DOF, 101, 1760512241.746: Head(pos: [3.540, 2.404, 1.593], rot: [5.080, 7.258, 6.355, 1.038]), Spine2(pos: [9.579, 2.308, 1.847], rot: [0.059, 8.118, 3.501, 6.526]), LeftArm(pos: [8.999, 0.246, 9.970], rot: [4.544, 0.580, 4.456, 1.992]), LeftForeArm(pos: [2.368, 6.548, 0.357], rot: [2.181, 4.856, 4.108, 3.070]), RightArm(pos: [2.816, 1.743, 8.460], rot: [2.325, 1.201, 6.576, 1.554]), RightForeArm(pos: [1.738, 7.106, 5.225], rot: [0.179, 1.703, 4.765, 0.839]), RightHand(pos: [9.955, 4.232, 4.448], rot: [4.717, 2.154, 9.040, 7.465]), RightHandThumb2(pos: [5.607, 9.120, 0.200], rot: [1.519, 6.191, 5.054, 1.281]), RightHandThumb1(pos: [0.790, 5.465, 3.948], rot: [8.809, 5.387, 0.557, 3.543]), RightHandIndex2(pos: [8.374, 2.914, 9.005], rot: [1.964, 3.172, 4.835, 4.045]), RightHandIndex1(pos: [0.824, 2.716, 6.149], rot: [5.559, 5.185, 0.850, 2.025]), RightHandMiddle2(pos: [2.019, 8.951, 5.565], rot: [5.268, 5.069, 3.224, 5.385]), RightHandMiddle1(pos: [9.861, 8.248, 1.728], rot: [4.456, 5.168, 4.466, 8.939]), RightHandRing2(pos: [9.052, 6.687, 9.520], rot: [8.367, 5.385, 6.231, 6.692]), RightHandRing1(pos: [3.076, 7.954, 6.442], rot: [6.692, 2.316, 9.451, 1.908]), RightHandPinky2(pos: [5.206, 9.323, 3.775], rot: [6.185, 5.471, 0.940, 3.120]), RightHandPinky1(pos: [7.816, 2.848, 1.227], rot: [9.758, 4.951, 5.415, 1.955]), LeftHand(pos: [2.166, 8.124, 9.284], rot: [6.120, 3.479, 8.747, 6.665]), LeftHandThumb2(pos: [8.513, 2.999, 5.465], rot: [0.832, 7.974, 1.544, 6.478]), LeftHandThumb1(pos: [1.582, 0.129, 6.390], rot: [9.830, 5.092, 1.790, 5.468]), LeftHandIndex2(pos: [9.805, 7.059, 1.011], rot: [0.682, 6.371, 2.648, 5.015]), LeftHandIndex1(pos: [8.539, 7.739, 3.178], rot: [7.412, 4.053, 9.804, 5.553]), LeftHandMiddle2(pos: [3.523, 3.369, 8.865], rot: [2.969, 1.512, 1.691, 6.769]), LeftHandMiddle1(pos: [7.458, 3.199, 1.939], rot: [4.782, 2.153, 2.341, 2.070]), LeftHandRing2(pos: [7.223, 9.727, 6.515], rot: [5.988, 3.272, 0.142, 0.802]), LeftHandRing1(pos: [1.205, 6.015, 7.121], rot: [0.597, 5.315, 8.537, 9.457]), LeftHandPinky2(pos: [1.071, 4.223, 8.325], rot: [7.055, 8.880, 2.564, 0.416]), LeftHandPinky1(pos: [7.118, 6.949, 3.440], rot: [6.550, 6.910, 8.168, 9.856]), bosch(pos: [5.119, 9.046, 4.639], rot: [1.326, 0.504, 2.155, 0.527]), plug_in(pos: [3.005, 6.233, 0.248], rot: [1.379, 9.225, 0.698, 1.055]), plug(pos: [5.289, 7.213, 8.430], rot: [3.459, 0.358, 6.354, 7.308]), mouse(pos: [7.984, 4.799, 1.079], rot: [6.441, 7.718, 8.183, 1.362]), bottle_cap(pos: [1.310, 0.743, 8.835], rot: [5.751, 7.284, 8.748, 0.109])
6================================================================================
7
8Summary:
9 File Path: dataset\3\3_2_1760512240\tracker_sixdof_data.h5
10 Total frames: 102
11 Elements per frame: 33
12 Index range difference: (last_index + 1\) - first_index = (101 + 1\) - 0 = 102
13 Index continuity: Normal
14================================================================================

Human skeletal motion data

The human skeletal motion data (full body + fingers) calculated based on tracker data is included in the file: human_bones.h5

Hierarchy

In addition to the "data" dataset, the root directory of this file also contains a "skeleton" group, which defines information such as human body bone length and connection relationships, equivalent to the "HIERARCHY" in the BVH format. The root group is "Skeleton", and all bones are sub-groups under "Skeleton", with a parallel hierarchical structure. The information contained in each bone is included through attributes. An example is as follows:

output
1/Skeleton
2 (attr) unit = "cm"
3 /Hips
4 (attr) parent = ""
5 (attr) offset = [0.0, 10.0, 0.0]
6 (attr) rotation_type = "quaternion"
7 (attr) channels = ["w","x","y","z"]
8 (attr) children = ["LeftUpLeg","RightUpLeg","Spine"]
9 ......
10 /LeftFoot
11 (attr) parent = "LeftLeg"
12 (attr) offset = [0.0, -40.0, 0.0]
13 (attr) rotation_type = "quaternion"
14 (attr) channels = ["w","x","y","z"]
15 (attr) children = ["LeftFoot_End"]
16 /LeftFoot_End
17 (attr) parent = "LeftFoot"
18 (attr) offset = [0.0, -2.0, 0.0]
19 (attr) rotation_type = "none"
20 (attr) channels = []
21 (attr) children = []
22 (attr) is_end = true

Data

Elements contain all the bones of a human body in one frame

  • name (string, e.g. "hip"),

  • position (float32, shape=[3], e.g. x,y,z)

  • rotation ((float32, shape=[4], e.g. w,x,y,z)

Parse code: read_sixdof_data_h5_2.py

Output result example:

output
1>python read_sixdof_data_h5.py dataset\1\1_41_1760493249\human_bone_data.h5
2Successfully opened file: dataset\1\1_41_1760493249\human_bone_data.h5
3File contains 68 frames of data
4[0] 6DOF, 0, 1760493249.683: Hips(pos: [7.421, 96.760, -150.097], rot: [0.963, -0.018, 0.266, 0.038]), RightUpLeg(pos: [-10.965, 0.397, -2.292], rot: [0.994, 0.059, -0.091, -0.029]), RightLeg(pos: [-0.061, -45.043, -0.383], rot: [0.997, 0.025, -0.057, -0.034]), RightFoot(pos: [-0.603, -42.091, -2.064], rot: [-0.990, 0.050, 0.127, -0.036]), LeftUpLeg(pos: [10.935, -0.395, 2.304], rot: [0.996, 0.068, -0.052, -0.021]), LeftLeg(pos: [0.032, -45.075, 0.384], rot: [-0.983, 0.028, 0.175, -0.041]), LeftFoot(pos: [0.893, -42.288, 1.950], rot: [0.997, -0.024, -0.041, -0.068]), Spine(pos: [0.006, 8.119, -0.013], rot: [1.000, 0.001, 0.010, -0.005]), Spine1(pos: [0.005, 17.979, -0.008], rot: [1.000, 0.001, 0.010, -0.008]), Spine2(pos: [0.003, 12.759, 0.002], rot: [0.999, 0.002, 0.048, -0.010]), Neck(pos: [0.001, 19.140, 0.002], rot: [1.000, 0.008, 0.000, -0.002]), Neck1(pos: [0.000, 4.250, 0.000], rot: [1.000, 0.009, 0.000, -0.002]), Head(pos: [0.001, 4.250, -0.001], rot: [0.944, 0.006, -0.330, -0.003]), RightShoulder(pos: [-2.902, 13.341, -0.006], rot: [-0.998, 0.047, -0.040, 0.020]), RightArm(pos: [-16.056, -0.001, -0.008], rot: [0.728, -0.144, 0.530, 0.411]), RightForeArm(pos: [-27.998, -0.035, 0.031], rot: [0.743, -0.130, 0.577, -0.312]), RightHand(pos: [-25.992, 0.018, -0.012], rot: [0.870, 0.308, 0.097, -0.373]), RightHandThumb1(pos: [-1.937, -0.484, 2.518], rot: [0.889, 0.345, 0.277, -0.117]), RightHandThumb2(pos: [-3.872, 0.000, 0.000], rot: [0.997, 0.000, -0.000, 0.083]), RightHandThumb3(pos: [-2.690, 0.000, 0.000], rot: [0.998, 0.000, 0.000, 0.055]), RightInHandIndex(pos: [-3.389, 0.535, 2.080], rot: [1.000, 0.000, 0.000, 0.000]), RightHandIndex1(pos: [-5.485, -0.096, 1.051], rot: [0.986, 0.014, 0.130, 0.107]), RightHandIndex2(pos: [-3.806, 0.000, 0.000], rot: [0.996, 0.000, -0.000, 0.088]), RightHandIndex3(pos: [-2.158, 0.000, 0.000], rot: [0.998, 0.000, -0.000, 0.062]), RightInHandMiddle(pos: [-3.556, 0.544, 0.796], rot: [1.000, 0.000, 0.000, 0.000]), RightHandMiddle1(pos: [-5.441, -0.088, 0.330], rot: [0.999, 0.000, 0.000, 0.054]), RightHandMiddle2(pos: [-4.153, 0.000, 0.000], rot: [0.999, -0.000, -0.000, 0.054]), RightHandMiddle3(pos: [-2.603, 0.000, 0.000], rot: [-0.999, -0.000, -0.000, -0.036]), RightInHandRing(pos: [-3.539, 0.566, -0.136], rot: [1.000, 0.000, 0.000, 0.000]), RightHandRing1(pos: [-4.873, -0.023, -0.504], rot: [0.995, -0.005, -0.087, 0.055]), RightHandRing2(pos: [-3.619, 0.000, 0.000], rot: [0.998, 0.000, 0.000, 0.055]), RightHandRing3(pos: [-2.511, 0.000, 0.000], rot: [0.999, 0.000, 0.000, 0.037]), RightInHandPinky(pos: [-3.324, 0.494, -1.264], rot: [1.000, 0.000, 0.000, 0.000]), RightHandPinky1(pos: [-4.354, -0.023, -1.147], rot: [0.985, -0.004, -0.174, 0.021]), RightHandPinky2(pos: [-2.898, 0.000, 0.000], rot: [0.999, 0.000, -0.000, 0.032]), RightHandPinky3(pos: [-1.831, 0.000, 0.000], rot: [1.000, -0.000, -0.000, 0.021]), LeftShoulder(pos: [2.899, 13.338, -0.013], rot: [0.993, -0.067, 0.078, 0.049]), LeftArm(pos: [16.100, -0.000, -0.008], rot: [0.692, 0.144, -0.004, -0.707]), LeftForeArm(pos: [28.000, -0.000, -0.000], rot: [0.998, -0.016, -0.049, -0.020]), LeftHand(pos: [26.000, 0.001, 0.004], rot: [0.996, 0.072, -0.057, 0.001]), LeftHandThumb1(pos: [1.937, -0.484, 2.518], rot: [0.879, 0.411, -0.238, 0.039]), LeftHandThumb2(pos: [3.872, 0.000, 0.000], rot: [0.992, 0.000, 0.000, -0.124]), LeftHandThumb3(pos: [2.690, 0.000, 0.000], rot: [0.997, 0.000, -0.000, -0.082]), LeftInHandIndex(pos: [3.389, 0.535, 2.080], rot: [1.000, 0.000, 0.000, 0.000]), LeftHandIndex1(pos: [5.485, -0.096, 1.051], rot: [0.965, 0.030, -0.127, -0.228]), LeftHandIndex2(pos: [3.806, 0.000, 0.000], rot: [0.982, 0.000, 0.000, -0.189]), LeftHandIndex3(pos: [2.158, 0.000, 0.000], rot: [0.991, 0.000, -0.000, -0.133]), LeftInHandMiddle(pos: [3.556, 0.544, 0.796], rot: [1.000, 0.000, 0.000, 0.000]), LeftHandMiddle1(pos: [5.441, -0.088, 0.330], rot: [0.970, 0.000, 0.000, -0.244]), LeftHandMiddle2(pos: [4.153, 0.000, 0.000], rot: [0.970, -0.000, 0.000, -0.244]), LeftHandMiddle3(pos: [2.603, 0.000, 0.000], rot: [0.987, 0.000, 0.000, -0.164]), LeftInHandRing(pos: [3.539, 0.566, -0.136], rot: [1.000, 0.000, 0.000, 0.000]), LeftHandRing1(pos: [4.873, -0.023, -0.504], rot: [-0.963, 0.022, -0.084, 0.256]), LeftHandRing2(pos: [3.619, 0.000, 0.000], rot: [0.966, 0.000, 0.000, -0.257]), LeftHandRing3(pos: [2.511, 0.000, 0.000], rot: [0.985, 0.000, 0.000, -0.172]), LeftInHandPinky(pos: [3.324, 0.494, -1.264], rot: [1.000, 0.000, 0.000, 0.000]), LeftHandPinky1(pos: [4.354, -0.023, -1.147], rot: [0.957, -0.041, 0.169, -0.232]), LeftHandPinky2(pos: [2.898, 0.000, 0.000], rot: [0.937, 0.000, 0.000, -0.350]), LeftHandPinky3(pos: [1.831, 0.000, 0.000], rot: [0.972, 0.000, 0.000, -0.236])......
5[67] 6DOF, 67, 1760493250.952: Hips(pos: [7.361, 96.789, -149.369], rot: [0.994, -0.017, 0.111, 0.011]), RightUpLeg(pos: [-11.183, 0.213, -1.646], rot: [0.996, 0.048, -0.067, -0.024]), RightLeg(pos: [-0.067, -45.055, -0.275], rot: [1.000, 0.006, 0.019, -0.021]), RightFoot(pos: [-0.324, -42.057, -1.510], rot: [-0.997, 0.029, 0.067, -0.029]), LeftUpLeg(pos: [11.191, -0.227, 1.653], rot: [0.996, 0.085, -0.037, -0.000]), LeftLeg(pos: [0.052, -45.053, 0.276], rot: [-0.995, 0.033, 0.093, -0.032]), LeftFoot(pos: [0.561, -42.133, 1.446], rot: [-0.997, 0.039, -0.020, 0.056]), Spine(pos: [0.003, 8.123, -0.000], rot: [1.000, 0.003, 0.007, -0.002]), Spine1(pos: [-0.001, 17.982, -0.001], rot: [1.000, 0.003, 0.007, -0.003]), Spine2(pos: [-0.004, 12.762, -0.003], rot: [0.999, 0.002, 0.031, -0.004]), Neck(pos: [0.001, 19.140, 0.001], rot: [1.000, 0.003, -0.000, 0.001]), Neck1(pos: [0.000, 4.250, 0.000], rot: [1.000, 0.004, -0.000, 0.001]), Head(pos: [0.000, 4.250, -0.001], rot: [0.988, 0.003, -0.155, 0.000]), RightShoulder(pos: [-2.900, 13.338, -0.011], rot: [-0.998, 0.053, -0.019, 0.023]), RightArm(pos: [-16.116, 0.002, -0.001], rot: [0.754, -0.098, 0.489, 0.427]), RightForeArm(pos: [-28.013, 0.001, -0.008], rot: [0.714, 0.011, 0.648, -0.264]), RightHand(pos: [-25.988, 0.005, -0.018], rot: [0.892, 0.273, 0.084, -0.350]), RightHandThumb1(pos: [-1.937, -0.484, 2.518], rot: [0.888, 0.343, 0.283, -0.120]), RightHandThumb2(pos: [-3.872, 0.000, 0.000], rot: [-0.997, -0.000, 0.000, -0.080]), RightHandThumb3(pos: [-2.690, 0.000, 0.000], rot: [0.999, 0.000, 0.000, 0.053]), RightInHandIndex(pos: [-3.389, 0.535, 2.080], rot: [1.000, 0.000, 0.000, 0.000]), RightHandIndex1(pos: [-5.485, -0.096, 1.051], rot: [0.986, 0.014, 0.130, 0.105]), RightHandIndex2(pos: [-3.806, 0.000, 0.000], rot: [0.996, 0.000, -0.000, 0.087]), RightHandIndex3(pos: [-2.158, 0.000, 0.000], rot: [0.998, 0.000, 0.000, 0.061]), RightInHandMiddle(pos: [-3.556, 0.544, 0.796], rot: [1.000, 0.000, 0.000, 0.000]), RightHandMiddle1(pos: [-5.441, -0.088, 0.330], rot: [0.998, -0.000, -0.000, 0.063]), RightHandMiddle2(pos: [-4.153, 0.000, 0.000], rot: [0.998, 0.000, 0.000, 0.063]), RightHandMiddle3(pos: [-2.603, 0.000, 0.000], rot: [0.999, 0.000, 0.000, 0.042]), RightInHandRing(pos: [-3.539, 0.566, -0.136], rot: [1.000, 0.000, 0.000, 0.000]), RightHandRing1(pos: [-4.873, -0.023, -0.504], rot: [-0.995, 0.005, 0.087, -0.055]), RightHandRing2(pos: [-3.619, 0.000, 0.000], rot: [-0.998, -0.000, -0.000, -0.055]), RightHandRing3(pos: [-2.511, 0.000, 0.000], rot: [0.999, 0.000, 0.000, 0.037]), RightInHandPinky(pos: [-3.324, 0.494, -1.264], rot: [1.000, 0.000, 0.000, 0.000]), RightHandPinky1(pos: [-4.354, -0.023, -1.147], rot: [-0.985, 0.002, 0.174, -0.012]), RightHandPinky2(pos: [-2.898, 0.000, 0.000], rot: [1.000, 0.000, 0.000, 0.018]), RightHandPinky3(pos: [-1.831, 0.000, 0.000], rot: [1.000, 0.000, 0.000, 0.012]), LeftShoulder(pos: [2.901, 13.342, 0.010], rot: [0.995, -0.068, 0.058, 0.048]), LeftArm(pos: [16.098, 0.000, -0.000], rot: [0.697, 0.145, 0.002, -0.702]), LeftForeArm(pos: [27.999, -0.002, 0.000], rot: [0.999, -0.023, -0.031, -0.015]), LeftHand(pos: [26.000, -0.000, -0.001], rot: [0.996, 0.068, -0.057, 0.003]), LeftHandThumb1(pos: [1.937, -0.484, 2.518], rot: [0.879, 0.412, -0.239, 0.035]), LeftHandThumb2(pos: [3.872, 0.000, 0.000], rot: [0.992, -0.000, 0.000, -0.125]), LeftHandThumb3(pos: [2.690, 0.000, 0.000], rot: [0.997, 0.000, -0.000, -0.083]), LeftInHandIndex(pos: [3.389, 0.535, 2.080], rot: [1.000, 0.000, 0.000, 0.000]), LeftHandIndex1(pos: [5.485, -0.096, 1.051], rot: [0.964, 0.031, -0.127, -0.232]), LeftHandIndex2(pos: [3.806, 0.000, 0.000], rot: [0.981, 0.000, 0.000, -0.192]), LeftHandIndex3(pos: [2.158, 0.000, 0.000], rot: [0.991, 0.000, -0.000, -0.135]), LeftInHandMiddle(pos: [3.556, 0.544, 0.796], rot: [1.000, 0.000, 0.000, 0.000]), LeftHandMiddle1(pos: [5.441, -0.088, 0.330], rot: [-0.970, -0.000, 0.000, 0.244]), LeftHandMiddle2(pos: [4.153, 0.000, 0.000], rot: [0.970, 0.000, 0.000, -0.244]), LeftHandMiddle3(pos: [2.603, 0.000, 0.000], rot: [0.987, 0.000, 0.000, -0.163]), LeftInHandRing(pos: [3.539, 0.566, -0.136], rot: [1.000, 0.000, 0.000, 0.000]), LeftHandRing1(pos: [4.873, -0.023, -0.504], rot: [-0.963, 0.022, -0.084, 0.253]), LeftHandRing2(pos: [3.619, 0.000, 0.000], rot: [0.967, 0.000, 0.000, -0.254]), LeftHandRing3(pos: [2.511, 0.000, 0.000], rot: [0.985, 0.000, 0.000, -0.170]), LeftInHandPinky(pos: [3.324, 0.494, -1.264], rot: [1.000, 0.000, 0.000, 0.000]), LeftHandPinky1(pos: [4.354, -0.023, -1.147], rot: [0.959, -0.040, 0.169, -0.225]), LeftHandPinky2(pos: [2.898, 0.000, 0.000], rot: [0.941, 0.000, -0.000, -0.339]), LeftHandPinky3(pos: [1.831, 0.000, 0.000], rot: [0.974, 0.000, -0.000, -0.228])
6================================================================================
7
8Summary:
9 File Path: dataset\1\1_41_1760493249\human_bone_data.h5
10 Total frames: 68
11 Elements per frame: 59
12 Index range difference: (last_index + 1\) - first_index = (67 + 1\) - 0 = 68
13 Index continuity: Normal
14================================================================================

Palm Pressure Data

Palm pressure data is encapsulated in the hand_pressure.h5 file, with each palm containing 129 pressure points, and the value range of each point is 0 to 255.

Hand Pressure Point Map:

elements contain one frame of hand palm pressure data:

  • name (string, e.g. : "left" or "right")

  • value (uint8, shape=[129])

Parse code: read_hand_pressure_h5.py

Output result example:

output
1>python read_hand_pressure_h5.py dataset\3\3_2_1760512240\hand_pressure_data.h5
2Successfully opened file: dataset\3\3_2_1760512240\hand_pressure_data.h5
3File contains 68 frames of pressure data
4[0] HandPressure, 0, 1760512240.635: left([86, 17, 246, 100, 167, 183, 120, 221, 1, 75, 80, 241, 219, 47, 29, 84, 226, 203, 102, 161, 82, 64, 19, 161, 149, 211, 97, 156, 208, 148, 67, 188, 115, 131, 151, 120, 59, 98, 37, 56, 1, 112, 187, 152, 249, 229, 88, 22, 168, 224, 241, 206, 130, 18, 139, 181, 137, 15, 114, 111, 72, 154, 244, 230, 42, 2, 179, 105, 56, 120, 239, 75, 228, 75, 130, 182, 91, 152, 255, 85, 120, 178, 125, 207, 187, 5, 80, 31, 88, 74, 52, 160, 175, 139, 114, 157, 77, 168, 130, 116, 172, 103, 197, 152, 237, 239, 209, 124, 68, 163, 35, 185, 39, 21, 74, 147, 66, 53, 157, 168, 13, 168, 184, 56, 156, 21, 18, 219, 8], len=129), right([185, 113, 48, 184, 119, 176, 247, 226, 169, 233, 150, 36, 76, 157, 23, 171, 85, 3, 0, 56, 210, 205, 164, 139, 54, 8, 14, 8, 61, 185, 43, 82, 109, 190, 101, 171, 78, 23, 110, 244, 224, 67, 188, 62, 139, 194, 221, 165, 229, 215, 231, 120, 221, 233, 46, 190, 82, 178, 26, 86, 223, 178, 230, 161, 200, 197, 75, 40, 14, 194, 64, 177, 17, 25, 41, 234, 74, 76, 153, 178, 42, 108, 188, 235, 165, 147, 84, 125, 216, 106, 3, 28, 210, 100, 138, 16, 87, 238, 72, 209, 103, 79, 98, 109, 1, 106, 155, 78, 140, 221, 18, 231, 176, 127, 244, 143, 240, 77, 86, 210, 109, 116, 128, 172, 81, 218, 123, 229, 51], len=129)......
5[67] HandPressure, 67, 1760512241.746: left([111, 134, 161, 38, 190, 204, 79, 248, 252, 35, 160, 126, 137, 243, 216, 127, 131, 9, 185, 228, 218, 174, 194, 30, 87, 229, 170, 59, 98, 239, 37, 31, 32, 112, 43, 170, 54, 83, 117, 100, 129, 27, 7, 110, 79, 34, 96, 180, 163, 99, 185, 104, 44, 46, 130, 35, 50, 139, 90, 183, 64, 110, 185, 34, 42, 142, 154, 112, 216, 240, 21, 19, 140, 199, 4, 140, 209, 108, 2, 51, 89, 136, 211, 31, 135, 60, 243, 68, 4, 120, 125, 226, 235, 19, 57, 154, 97, 198, 102, 179, 78, 210, 165, 3, 30, 206, 161, 47, 197, 101, 18, 43, 68, 50, 228, 126, 74, 67, 248, 186, 164, 125, 182, 27, 184, 203, 209, 30, 46], len=129), right([108, 216, 41, 202, 192, 184, 200, 129, 236, 64, 110, 226, 41, 110, 14, 82, 5, 220, 17, 201, 186, 201, 160, 99, 88, 84, 19, 231, 103, 84, 50, 138, 56, 80, 14, 189, 184, 81, 255, 49, 159, 152, 90, 78, 123, 155, 240, 45, 68, 76, 157, 154, 231, 152, 107, 172, 222, 150, 1, 120, 187, 246, 4, 36, 156, 147, 202, 204, 20, 1, 167, 204, 183, 57, 166, 2, 139, 194, 182, 144, 44, 139, 115, 132, 123, 215, 74, 151, 24, 58, 57, 97, 77, 68, 72, 184, 78, 96, 162, 212, 71, 65, 58, 97, 54, 37, 131, 222, 253, 245, 177, 147, 94, 93, 35, 158, 146, 69, 131, 242, 71, 83, 77, 193, 144, 229, 241, 56, 35], len=129)
6================================================================================
7
8Summary:
9 Path: dataset\3\3_2_1760512240\hand_pressure_data.h5
10 Total frames: 68
11 Elements per frame: 2
12 Index range difference: (last_index + 1\) - first_index = (67 + 1\) - 0 = 68
13 Index continuity: Normal
14================================================================================

Data Player

This software visualizes the various data points for each recorded entry.

After startup, open trackers_sixdof.h5 in the corresponding data directory. The program will automatically load other data files. The running effect is shown in the figure below:

FAQ

Explain the system's coordinate system.

The tracker's 6DoF and human skeleton motion data share the same world coordinate system.

The origin of the world coordinate system is typically placed on the ground with the Y-axis pointing upward, as shown in the figure above. (i.e., the coordinate system of the optical environment)

The tracker's 6DoF data (trackers_sixdof.h5) represents the coordinate pose of its own model within the world coordinate system, with length units in meters.

The root node Hips in the human skeleton motion data (human_bones.h5) provides 6DoF data in the world coordinate system. Subsequent child bone data represent coordinate poses relative to their parent nodes, with length units in centimeters.

What are the different forms of trackers, and what do their coordinate systems look like?

A tracker is a combination of an optical rigid body and an IMU inertial sensor. Each tracker has its own name, as detailed in the "Tracker List". Currently, there are six distinct configurations. The model files, coordinate systems, and optical point topologies for each tracker configuration are defined as follows:

Type Name Model file Diagram
PWR_M_PN3 PWR_M_PN3_V2.stl
PWR_K_PNS PWR_K_PNS.stl
PWR_K_Link_V2 PWR_K_Link_V2.stl
PWR_H_LinkHand PWR_H_LinkHand.stl
PWR_M_FingerA PWR_M_FingerA.stl
PWR_M_FingerB PWR_M_FingerB.stl

Where are these trackers used?

These trackers fall into two main categories: wireless trackers and wired trackers.

  • Wireless trackers are used for tracking props.

  • Wired trackers are used for tracking body parts.

Wireless trackers

  • PWR_M_PN3: A small wireless tracker attached to smaller props for tracking purposes.

  • PWR_K_PNS: A wireless large tracker designed for attaching to larger props such as tables, boxes, and dual fixed-position cameras.

Props equipped with wireless trackers have their model files aligned so that their origin point and orientation perfectly match those of the tracker. This ensures that after importing the prop model file, no coordinate conversion is required—the prop can be directly driven by the tracker's 6DoF data.

Prop Model File

The following table shows the model file for the cola can in the sample data.

Tool Name Model file Diagram
cola_modern_330 cola_modern_330_chip.stl

Wired trackers

  • PWR_H_LinkHand: Wired tracker mounted on the back of both hands

  • PWR_M_FingerA and PWR_M_FingerB: Both are wired trackers mounted on fingers, with one at the fingertip and one at the finger base for each finger.

Hand tracker diagram:

Tracker name and model correspondence

The correspondence between trackers and models is as follows: each recording session includes 6DoF data from at least 31 trackers, and this correspondence remains consistent across all recordings. (Additional trackers for props may be included based on the scene.)

python
1 "Head": "PWR_K_Link_V2", # Header
2 "Spine": "PWR_K_Link_V2", # Back
3 "Hips": "PWR_K_Link_V2", # Hip
4 "RightUpLeg": "PWR_K_Link_V2", # Right thigh
5 "RightFoot": "PWR_K_Link_V2", # Right foot
6 "LeftUpLeg": "PWR_K_Link_V2", # Left thigh
7 "LeftFoot": "PWR_K_Link_V2", # Left foot
8 "RightHand": "PWR_H_LinkHand", # Back of right hand
9 "RightHandThumb2": "PWR_M_FingerB", # Right thumb tip
10 "RightHandThumb1": "PWR_M_FingerA", # Right thumb base
11 "RightHandIndex2": "PWR_M_FingerA", # Right index finger tip
12 "RightHandIndex1": "PWR_M_FingerB", # Right index finger root
13 "RightHandMiddle2": "PWR_M_FingerB", # Right middle fingertip
14 "RightHandMiddle1": "PWR_M_FingerA", # Root of the right middle finger
15 "RightHandRing2": "PWR_M_FingerA", # Tip of the right ring finger
16 "RightHandRing1": "PWR_M_FingerB", # Base of the right ring finger
17 "RightHandPinky2": "PWR_M_FingerB", # Tip of the right pinky finger
18 "RightHandPinky1": "PWR_M_FingerA", # Base of the right pinky finger
19 "LeftHand": "PWR_H_LinkHand", # Back of the left hand
20 "LeftHandThumb2": "PWR_M_FingerB", # Tip of the left thumb finger
21 "LeftHandThumb1": "PWR_M_FingerA", # Base of the left thumb finger
22 "LeftHandIndex2": "PWR_M_FingerA", # Tip of the left index finger
23 "LeftHandIndex1": "PWR_M_FingerB", # Base of the left index finger
24 "LeftHandMiddle2": "PWR_M_FingerB", # Tip of the left middle finger
25 "LeftHandMiddle1": "PWR_M_FingerA", # Root of the left middle finger
26 "LeftHandRing2": "PWR_M_FingerA", # Tip of the left ring finger
27 "LeftHandRing1": "PWR_M_FingerB", # Base of the left ring finger
28 "LeftHandPinky2": "PWR_M_FingerB", # Tip of the left little finger
29 "LeftHandPinky1": "PWR_M_FingerA", # Root of the left little finger
30 "fixed1_cam": "PWR_K_PNS", # Fixed Camera 1
31 "fixed2_cam": "PWR_K_PNS", # Fixed Camera 2
Name Meaning
RightHandPinky2 Right little finger tip
RightHandPinky1 Base of the right little finger
LeftHand Back of left hand
LeftHandThumb2 Left thumb tip
LeftHandThumb1 Left thumb base
LeftHandIndex2 Left index finger tip
LeftHandIndex1 Left index finger root
LeftHandMiddle2 Left middle finger tip
LeftHandMiddle1 Root of the left middle finger
LeftHandRing2 Left ring finger tip
LeftHandRing1 Base of the left ring finger
LeftHandPinky2 Left little finger tip
LeftHandPinky1 Base of the left little finger
TBD Other Props

How is the camera tracked, and how are external reference information and coordinate information defined?

The setup includes three camera channels totaling six cameras: one head-mounted camera and two fixed-position cameras. Each channel comprises one RealSense D435 camera and one USB wide-angle camera, connected using identical structural components as shown below.

Camera Connection Diagram

The camera coordinate system defaults to: right-down-front.

Camera intrinsic and extrinsic parameters are defined in the camera_params/ directory. The trackers corresponding to the three-camera system are listed in the table below:

Camera Position Tracker name
Header Head
Fixed Camera 1 fixed1_cam
Fixed Camera 2 fixed2_cam

Retrieve the 6DoF data for the corresponding tracker name from trackers_sixdof.h5, then apply the corresponding camera's intrinsic and extrinsic parameters to complete the camera's reprojection.

Sample Data

Description Data
Move the Coke on the table HiPHI-OM-move-cola.zip

3. Technical Specification: In-The-Wild (ITW) Dataset

  • Ecological Validity and Scene Generalization: ITW comprises a high-entropy corpus captured across diverse unconstrained environments, including residential, hospitality, retail, and logistics sectors. By incorporating stochastic variables such as non-uniform lighting, dynamic occlusions, and unstructured spatial layouts, the dataset exposes models to the long-tail edge cases of real-world deployment, significantly enhancing policy robustness against environmental distribution shifts.

  • Capture of Unconstrained Behavioral Dynamics: Unlike scripted laboratory protocols, ITW prioritizes the recording of naturalistic human-object interactions and operational logic. This focuses the training signal on the inherent "common sense" of human motion—reflecting how humans prioritize tasks and navigate social spaces—which allows for the development of humanoid agents that exhibit more intuitive and predictable behaviors in shared environments.

  • High-Throughput Distributed Acquisition: The dataset utilizes a decentralized collection strategy involving portable, low-profile sensing arrays. This methodology allows for massive parallelization of data acquisition, achieving a throughput several times higher than traditional teleoperation or laboratory-bound methods. This scalability is critical for the generation of the high-volume datasets required for foundation model training in the embodied AI space.

  • Cross-Ontology Annotation and Compliance Pipeline: The ITW framework includes an end-to-end pipeline for data desensitization (anonymization), compliance auditing, and post-processing. A specialized toolchain enables the semantic annotation of unstructured data, ensuring it remains compatible across diverse ontologies. This allows real-world behavioral "noise" to be translated into structured training signals that are usable across various robotic morphologies and task-planning architectures.

File Structure

File Name Description
camera_params/ Intrinsic parameters for the head camera
config.json Metadata and description of the data in this collection
depth_head.mkv Head depth video
depth_head.csv Timestamps for the head depth images
hands_keypoint_3d.json 3D hand keypoint data
head_hands_sixdof.csv 6DOF data for the head and both wrists. The first frame of the head-mounted camera corresponds to the origin of the world coordinate system, and the wrist position is represented as relative information with respect to the head-mounted camera.
task_info.json Task information for this collection
rgb_head.csv Per-frame timestamps for the head RGB video
rgb_head.mp4 Head RGB video
mic.wav Audio recording

Depth Data

Extract each depth PNG image from depth_head.mkv according to the information in depth_head.csv.

Each PNG contains one frame of 16-bit depth data.

Parse code: read_png_16bit.py

Output result example:

output
1>python read_png_16bit.py dataset\3\3_1_1760508893\depth_fixed\depth_0_1760508893407.png
2Image data type: uint16
3Image shape (height, width): (480, 640)
4Pixel value range: [0, 4999]
5Pixel values:(0, 0): 2229 (0, 1): 895 (0, 2): 2995 (0, 3): 4374 (0, 4): 3547 (0, 5): 1692 (0, 6): 4714 (0, 7): 1647 (0, 8): 3925 (0, 9): 3390 (0, 10): 2282 (0, 11): 862 (0, 12): 2801 (0, 13): 1817 (0, 14): 3244 (0, 15): 1869 (0, 16): 1273 (0, 17): 1041 (0, 18): 2761 (0, 19): 3518 (0, 20): 2127 (0, 21): 3061 (0, 22): 1924 (0, 23): 3374 (0, 24): 908 (0, 25): 3501 (0, 26): 1822 (0, 27): 3944 (0, 28): 252 (0, 29): 2680 (0, 30): 1078 (0, 31): 4535 (0, 32): 356 (0, 33): 2394 (0, 34): 3 (0, 35): 827 (0, 36): 3834 (0, 37): 4101 (0, 38): 2683 (0, 39): 1128 (0, 40): 2544 (0, 41): 2289 (0, 42): 58 (0, 43): 2335 (0, 44): 3181 (0, 45): 1335 (0, 46): 4882 (0, 47): 4324 (0, 48): 795 (0, 49): 4056 (0, 50): 1729 (0, 51): 1073 (0, 52): 2216 (0, 53): 3168 (0, 54): 719 (0, 55): 693 (0, 56): 3484 (0, 57): 137 (0, 58): 3165 (0, 59): 2427 (0, 60): 3391 (0, 61): 1962 (0, 62): 2656 (0, 63): 3696 (0, 64): 4627 (0, 65): 1604 (0, 66): 4554 (0, 67): 615 (0, 68): 4258 (0, 69): 4757 (0, 70): 343 (0, 71): 202 (0, 72): 2056 (0, 73): 874 (0, 74): 1838 (0, 75): 742 (0, 76): 880 (0, 77): 1573 (0, 78): 3504 (0, 79): 4451 (0, 80): 2053 (0, 81): 667 (0, 82): 4895 (0, 83): 861 (0, 84): 1448 (0, 85): 2262 (0, 86): 80 (0, 87): 1445 (0, 88): 3191 (0, 89): 3864 (0, 90): 2022 (0, 91): 4655 (0, 92): 266 (0, 93): 260 (0, 94): 2292 (0, 95): 2861 (0, 96): 248 (0, 97): 671 (0, 98): 3239 (0, 99): 3710 (0, 100): 3766 (0, 101): 1283 (0, 102): 2494 (0, 103): 2164 (0, 104): 4340 (0, 105): 3539 (0, 106): 1558 (0, 107): 619 (0, 108): 4826 (0, 109): 1730 (0, 110): 2195 (0, 111): 3813 (0, 112): 2310 (0, 113): 1343 (0, 114): 2980 (0, 115): 3945 (0, 116): 315 (0, 117): 4461 (0, 118): 1315 (0, 119): 3767 (0, 120): 1854 (0, 121): 957 (0, 122): 2968 (0, 123): 3151 (0, 124): 1445 (0, 125): 3355 (0, 126): 3410 (0, 127): 860 (0, 128): 2301 (0, 129): 2527 (0, 130): 2324 (0, 131): 3310 (0, 132): 276 (0, 133): 3899 (0, 134): 102 (0, 135): 2384 (0, 136): 2996 (0, 137): 109 (0, 138): 922 (0, 139): 4917 (0, 140): 2406 (0, 141): 619 (0, 142): 307 (0, 143): 2187 (0, 144): 2679 (0, 145): 3516 (0, 146): 2818 (0, 147): 535 (0, 148): 1242 (0, 149): 1102 (0, 150): 3657 (0, 151): 3104 (0, 152): 807 (0, 153): 3926 (0, 154): 3332 (0, 155): 3453 (0, 156): 2338 (0, 157): 250 (0, 158): 3388 (0, 159): 4432 (0, 160): 2745 (0, 161): 538 (0, 162): 2648 (0, 163): 4757 (0, 164): 1002 (0, 165): 4200 (0, 166): 1126 (0, 167): 3228 (0, 168): 4195 (0, 169): 1135 (0, 170): 4117 (0, 171): 789 (0, 172): 3131 (0, 173): 1786 (0, 174): 4705 (0, 175): 2263 (0, 176): 3551 (0, 177): 2455 (0, 178): 1543 (0, 179): 1735 (0, 180): 4995 (0, 181): 886 (0, 182): 3535 (0, 183): 3820 (0, 184): 4037 (0, 185): 3589 (0, 186): 1743 (0, 187): 316 (0, 188): 2223 (0, 189): 2552 (0, 190): 2763 (0, 191): 3179 (0, 192): 4976 (0, 193): 2888 (0, 194): 3415 (0, 195): 3515 (0, 196): 4460 (0, 197): 2020 (0, 198): 4898 (0, 199): 4138 (0, 200): 3994 (0, 201): 3146 (0, 202): 1844 (0, 203): 2860 (0, 204): 4602 (0, 205): 3212 (0, 206): 3750 (0, 207): 3079 (0, 208): 359 (0, 209): 4843 (0, 210): 3290 (0, 211): 718 (0, 212): 1020 (0, 213): 2644 (0, 214): 1384 (0, 215): 4617 (0, 216): 2844 (0, 217): 4825 (0, 218): 4928 (0, 219): 1177 (0, 220): 4585 (0, 221): 3034 (0, 222): 2382 (0, 223): 1233 (0, 224): 2610 (0, 225): 1418 (0, 226): 3538 (0, 227): 2643 (0, 228): 1012 (0, 229): 925 (0, 230): 3815 (0, 231): 1852 (0, 232): 2971 (0, 233): 496 (0, 234): 4573 (0, 235): 3874 (0, 236): 3522 (0, 237): 3187 (0, 238): 2196 (0, 239): 3725 (0, 240): 3469 (0, 241): 1070 (0, 242): 2604 (0, 243): 1639 (0, 244): 4423 (0, 245): 2680 (0, 246): 327 (0, 247): 3259 (0, 248): 1698 (0, 249): 251 (0, 250): 1238 (0, 251): 4077 (0, 252): 2870 (0, 253): 2897 (0, 254): 2452 (0, 255): 2858 (0, 256): 2765 (0, 257): 297 (0, 258): 3220 (0, 259): 3014 (0, 260): 3422 (0, 261): 1762 (0, 262): 2345 (0, 263): 3654 (0, 264): 261 (0, 265): 1800 (0, 266): 1239 (0, 267): 3758 (0, 268): 309 (0, 269): 568 (0, 270): 2154 (0, 271): 1835 (0, 272): 1193 (0, 273): 2603 (0, 274): 3344 (0, 275): 607 (0, 276): 751 (0, 277): 465 (0, 278): 3444 (0, 279): 1199 (0, 280): 1010 (0, 281): 4014 (0, 282): 658 (0, 283): 3120 (0, 284): 689 (0, 285): 2118 (0, 286): 503 (0, 287): 124 (0, 288): 4102 (0, 289): 842 (0, 290): 3979 (0, 291): 460 (0, 292): 160 (0, 293): 4660 (0, 294): 3781 (0, 295): 2831 (0, 296): 4011 (0, 297): 944 (0, 298): 1318 (0, 299): 4858 (0, 300): 3669 (0, 301): 932 (0, 302): 4000 (0, 303): 2817 (0, 304): 2516 (0, 305): 727 (0, 306): 530 (0, 307): 3398 (0, 308): 2861 (0, 309): 3774 (0, 310): 2900 (0, 311): 3533 (0, 312): 1493 (0, 313): 3201 (0, 314): 3312 (0, 315): 4431 (0, 316): 223 (0, 317): 2022 (0, 318): 2874 (0, 319): 910 (0, 320): 4824 (0, 321): 246 (0, 322): 4623 (0, 323): 3496 (0, 324): 463 (0, 325): 3367 (0, 326): 4978 (0, 327): 2157 (0, 328): 2640 (0, 329): 2327 (0, 330): 860 (0, 331): 4609 (0, 332): 2405 (0, 333): 2624 (0, 334): 192 (0, 335): 3151 (0, 336): 3184 (0, 337): 1699 (0, 338): 3350 (0, 339): 690 (0, 340): 3819 (0, 341): 3446 (0, 342): 2070 (0, 343): 697 (0, 344): 1447 (0, 345): 2494 (0, 346): 1968 (0, 347): 2823 (0, 348): 3012 (0, 349): 36 (0, 350): 2428 (0, 351): 3593 (0, 352): 4921 (0, 353): 1773 (0, 354): 585 (0, 355): 4115 (0, 356): 4439 (0, 357): 1189 (0, 358): 2920 (0, 359): 4544 (0, 360): 3181 (0, 361): 3115 (0, 362): 3071 (0, 363): 2899 (0, 364): 824 (0, 365): 4391 (0, 366): 1810 (0, 367): 1204 (0, 368): 2175 (0, 369): 1228 (0, 370): 4392 (0, 371): 1432 (0, 372): 3680 (0, 373): 2839 (0, 374): 1143 (0, 375): 4809 (0, 376): 4825 (0, 377): 2654 (0, 378): 2897 (0, 379): 726 (0, 380): 4421 (0, 381): 3494 (0, 382): 1256 (0, 383): 1552 (0, 384): 2376 (0, 385): 2855 (0, 386): 3714 (0, 387): 223 (0, 388): 1125 (0, 389): 813 (0, 390): 299 (0, 391): 3849 (0, 392): 3600 (0, 393): 2389 (0, 394): 4787 (0, 395): 1902 (0, 396): 4027 (0, 397): 3895 (0, 398): 3006 (0, 399): 2835 (0, 400): 722 (0, 401): 1200 (0, 402): 3251 (0, 403): 4236 (0, 404): 4493 (0, 405): 3922 (0, 406): 3248 (0, 407): 2911 (0, 408): 1439 (0, 409): 2746 (0, 410): 4049 (0, 411): 1887 (0, 412): 547 (0, 413): 2640 (0, 414): 2895 (0, 415): 2927 (0, 416): 705 (0, 417): 4506 (0, 418): 3382 (0, 419): 4055 (0, 420): 2464 (0, 421): 3003 (0, 422): 219 (0, 423): 3077 (0, 424): 1888 (0, 425): 1452 (0, 426): 2162 (0, 427): 4468 (0, 428): 190 (0, 429): 4557 (0, 430): 570 (0, 431): 4314 (0, 432): 4713 (0, 433): 2175 (0, 434): 8 (0, 435): 1294 (0, 436): 727 (0, 437): 1036 (0, 438): 2785 (0, 439): 1803 (0, 440): 1812 (0, 441): 3593 (0, 442): 446 (0, 443): 4430 (0, 444): 3949 (0, 445): 3296 (0, 446): 1341 (0, 447): 2179 (0, 448): 2436 (0, 449): 3399 (0, 450): 4999 (0, 451): 1526 (0, 452): 3562 (0, 453): 4067 (0, 454): 4304 (0, 455): 4841 (0, 456): 3366 (0, 457): 182 (0, 458): 1414 (0, 459): 4010 (0, 460): 2715 (0, 461): 2866 (0, 462): 1879 (0, 463): 4512 (0, 464): 742 (0, 465): 4167 (0, 466): 2028 (0, 467): 882 (0, 468): 1689 (0, 469): 962 (0, 470): 4490 (0, 471): 4545 (0, 472): 3517 (0, 473): 4138 (0, 474): 4169 (0, 475): 1454 (0, 476): 546 (0, 477): 850 (0, 478): 3459 (0, 479): 927 (0, 480): 3729 (0, 481): 123 (0, 482): 1422 (0, 483): 3038 (0, 484): 2690 (0, 485): 4690 (0, 486): 4424 (0, 487): 477 (0, 488): 1018 (0, 489): 2741 (0, 490): 1192 (0, 491): 2116 (0, 492): 769 (0, 493): 1207 (0, 494): 4340 (0, 495): 4091 (0, 496): 164 (0, 497): 3710 (0, 498): 1920 (0, 499): 4843 (0, 500): 3379 (0, 501): 2960 (0, 502): 3162 (0, 503): 4266 (0, 504): 3305 (0, 505): 935 (0, 506): 1676 (0, 507): 2800 (0, 508): 4173 (0, 509): 3277 (0, 510): 35 (0, 511): 3802 (0, 512): 4073 (0, 513): 1402 (0, 514): 3165 (0, 515): 1654 (0, 516): 2070 (0, 517): 4510 (0, 518): 1630 (0, 519): 1641 (0, 520): 2074 (0, 521): 1814 (0, 522): 757 (0, 523): 352 (0, 524): 1806 (0, 525): 3036 (0, 526): 2763 (0, 527): 2077 (0, 528): 1184 (0, 529): 3359 (0, 530): 3640 (0, 531): 2566 (0, 532): 4671 (0, 533): 2531 (0, 534): 1781 (0, 535): 3011 (0, 536): 2608 (0, 537): 2305 (0, 538): 2891 (0, 539): 2155 (0, 540): 4408 (0, 541): 1845 (0, 542): 1001 (0, 543): 2443 (0, 544): 2630 (0, 545): 2735 (0, 546): 1728 (0, 547): 4914 (0, 548): 3458 (0, 549): 2185 (0, 550): 4457 (0, 551): 2353 (0, 552): 4659 (0, 553): 2233 (0, 554): 3447 (0, 555): 2552 (0, 556): 2566 (0, 557): 1079 (0, 558): 2384 (0, 559): 1498 (0, 560): 2127 (0, 561): 4214 (0, 562): 4288 (0, 563): 220 (0, 564): 2664 (0, 565): 4102 (0, 566): 849 (0, 567): 87 (0, 568): 4278 (0, 569): 1012 (0, 570): 4604 (0, 571): 267 (0, 572): 1706 (0, 573): 4179 (0, 574): 3289 (0, 575): 1064 (0, 576): 76 (0, 577): 1531 (0, 578): 4776 (0, 579): 225 (0, 580): 4344 (0, 581): 362 (0, 582): 2157 (0, 583): 4017 (0, 584): 312 (0, 585): 2540 (0, 586): 918 (0, 587): 1094 (0, 588): 4009 (0, 589): 1341 (0, 590): 3738 (0, 591): 4509 (0, 592): 2958 (0, 593): 1906 (0, 594): 4452 (0, 595): 1296 (0, 596): 2124 (0, 597): 2871 (0, 598): 13 (0, 599): 2384 (0, 600): 3010 (0, 601): 1695 (0, 602): 3492 (0, 603): 4401 (0, 604): 1145 (0, 605): 4864 (0, 606): 3383 (0, 607): 1380 (0, 608): 4914 (0, 609): 3132 (0, 610): 4370 (0, 611): 3797 (0, 612): 2368 (0, 613): 4954 (0, 614): 2765 (0, 615): 2994 (0, 616): 1732 (0, 617): 1917 (0, 618): 1338 (0, 619): 2086 (0, 620): 464 (0, 621): 3836 (0, 622): 335 (0, 623): 1885 (0, 624): 2708 (0, 625): 2188 (0, 626): 2631 (0, 627): 1798 (0, 628): 1911 (0, 629): 548 (0, 630): 3335 (0, 631): 1598 (0, 632): 1083 (0, 633): 895 (0, 634): 1474 (0, 635): 1671 (0, 636): 4823 (0, 637): 4373 (0, 638): 1128 (0, 639): 1299......

Hands Keypoint 3d Visualization

Data Structure of `hands_keypoint_3d.json`

Each video directory contains a `hands_keypoint_3d.json` file storing per-frame 3D hand keypoints and MANO parameters. Top-level schema:

json
1{
2 "metadata": { "source": "depth_fusion", ... },
3 "quality_exclusion": { "excluded": false, "reasons": [] },
4 "frames": { "<timestamp>": { ... }, ... },
5 "quality_summary": { ... }
6}
  • quality_exclusion: Indicates whether the entire video should be excluded (e.g., due to excessive missing data).

  • quality_summary: Aggregated statistics such as total frames and confidence distribution.

Frame-level schema (frames["<timestamp>"]):

json
1{
2 "excluded": false, // Frame-level exclusion flag
3 "exclude_reason": "", // Reason for exclusion (e.g., "tail")
4 "hands": [
5 {
6 "is_right": true, // true = right hand, false = left hand
7 "confidence": "high", // "high" | "low" | null
8 "keypoints_3d_cam_m": { // 3D coordinates of 21 joints (camera frame, meters)
9 "thumb_cmc": [x, y, z],
10 ...
11 "pinky_tip": [x, y, z]
12 },
13 "mano_parameters": {
14 "global_orient": [ax, ay, az], // Axis-angle rotation vector (rotvec, radians), shape (3,)
15 "transl": [tx, ty, tz], // Wrist position in camera coordinates (meters)
16 "betas": [b0, ..., b9], // MANO shape parameters (10D)
17 "hand_pose": [[[r00,...],...],...], // Rotation matrices for 15 joints, shape (15, 3, 3)
18 "hand_size_scale": 1.01 // Per-hand scale factor relative to MANO output
19 },
20 "wrist_6dof": [tx, ty, tz, rx, ry, rz] // [translation (m); axis-angle rotation (rotvec, radians)]
21 },
22 ...
23 ]
24 }

Joint Order (21):

wrist, thumb_cmc, thumb_mcp, thumb_ip, thumb_tip, index_mcp, index_pip, index_dip, index_tip, middle_mcp, middle_pip, middle_dip, middle_tip, ring_mcp, ring_pip, ring_dip, ring_tip, pinky_mcp, pinky_pip, pinky_dip, pinky_tip

Usage Notes:

  1. Coordinate System. All 3D coordinates are in the OpenCV camera frame: +X right, +Y down, +Z forward; unit is meters.

  2. Confidence Filtering. Only frames with confidence == "high" should be used for training/evaluation. Low-confidence hands are kept for completeness but are excluded from temporal smoothing.

  3. Tail Frames. Frames marked with excluded == true and exclude_reason == "tail" correspond to the last 2 seconds of the video and should be discarded due to unstable end-of-recording quality. Equivalently, drop any frame whose timestamp is greater than max_timestamp - 2.0 seconds.

  4. MANO Parameters. hand_pose is stored as 15×3×3 rotation matrices (not axis-angle). Temporal smoothing is applied in axis-angle (Lie algebra) space and the result is converted back to matrices, which preserves the SO(3) orthogonality constraint. betas is smoothed by a simple Gaussian filter.

  5. Left-Hand Handling. Only the MANO right-hand model (MANO_RIGHT.pkl) is shipped, so left hands are reconstructed by running the right-hand model and mirroring along the X-axis (**verts[:, 0] = -1, joints[:, 0] = -1). This must be done after MANO forward but before comparing the result with keypoints_3d_cam_m. See the inline comment in the example below.

Visualization Example

Please note: if the original RGB frame is already undistorted, do not apply undistortion again.

Parse code: example_kp_vis.py

bash
1python example_visualize_mano_kp.py \
2 --folder PATH_DATA

6Dof data of head and wrist

The 6DoF of the head and wrist is calculated through SLAM algorithm and recorded in: head_hands_sixdof.csv

Wrist 6Dof data only exists when the data collector wears a wrist QR code bracelet.

Citation

If you use the data from this website, please cite this work as

citation
1Noitom Robotics Team, "ModalityNet: The Art of Modalities in Human-Centric Data", Noitom Robotics Blog, 2026.

Or use the BibTeX citation:

citation
1@article{noitomrobotics2026modalitynet,
2author = {Noitom Robotics Team},
3title = {ModalityNet: The Art of Modalities in Human-Centric Data},
4journal = {Noitom Robotics Blog},
5year = {2026},
6note = {https://modalitynet.com},
7}