[Title] Shape of Motion: 4D Reconstruction from a Single Video
[Keyword] Human tracking, novel view synthesis, 3D Gaussian Splatting
[Journal] arXiv preprint arXiv:2407.13764
[arXiv] https://arxiv.org/abs/2407.13764
[Summary]
RGB video, Monodepth, and 2D Tracks(per-point) are given as the model’s input. 2D tracks are lifted to 3D tracks by using a depth map, and 3D tracks are used to initialize the 3D Gaussians. Each Gaussians are clustered by velocity, and each clusters share the same rotation and translation. 3D Gaussians are used to represent the scene, and it synthetic the monodepth, 3D tracks, and RGB image by rasterization. The loss between 2D input and projected 2D output optimizes the parameter of 3D Gaussians.
댓글남기기