Multi-THuMBS: Multi-person Tracking of 3D Human Meshes Beyond Video Shots

Jeongwan On1, Muhammad Salman Ali1, Muneeb A. Khan1, Sunwoo Park1, Inwoong Moon1, Hyung Jin Chang2, Jaekwang Kim3, Seong Jong Ha3, Seungryul Baek1

1UNIST Vision and Learning Lab, UNIST    2University of Birmingham    3CJ Corporation

ECCV 2026 (accepted)

Paper coming soon Code coming soon Video coming soon

Abstract

TLDR; Multi-THuMBS tackles multi-person 3D human mesh tracking in real-world videos with frequent shot changes. Existing methods often lose identity consistency when the camera view changes abruptly, especially when multiple people interact, overlap, or leave the frame. Our approach reconstructs boundary frames in a shared 3D space, registers human meshes across shots, and preserves per-person identity and motion consistency for temporally coherent 3D trajectories.

Method

Given a video split by a shot boundary, Multi-THuMBS first estimates human meshes and camera poses for all frames. It then builds a (A) shared 3D space from the boundary frames, (B) aligns meshes across shots, (C) propagates camera and mesh trajectories, (D) links identities using spatial, pose, and appearance cues, and applies (E) temporal smoothing.

Overview of the Multi-THuMBS method pipeline.
Overview of the Multi-THuMBS pipeline for shared-space alignment, identity linking, and trajectory smoothing.

Qualitative Visualization

Interactive mesh demo placeholder This area can later host an interactive viewer for Multi-THuMBS mesh outputs.

Preferred path

Export short mesh sequences as GLB/PLY and render them with Three.js or model-viewer.

Performance check

Keep demo assets small enough for GitHub Pages; use decimated meshes and compressed textures if needed.

Fallback

Use an MP4/GIF preview first, then upgrade to interactive playback after asset size and controls are stable.

Quantitative Results

Results table placeholder Replace this block with an image of the final quantitative results table.

BibTeX

Coming soon!

Acknowledgements

This work is supported by NRF grants (No. RS-2025-00521013 20%, No. RS-2025-02216916 10%) and IITP grants (No. RS-2020-II201336 Artificial intelligence graduate school program(UNIST) 10%; No. RS-2025-25442824 AI Star Fellowship Program(UNIST) 10%), funded by the Korean government (MSIT). This work is supported by CJ Corporation 50%.