LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation

Abstract

Neural Radiance Fields (NeRFs) have revolutionized the reconstruction of static scenes and objects in 3D, offering unprecedented quality. However, extending NeRFs to model dynamic objects or object articulations remains a challenging problem. Previous works have tackled this issue by focusing on part-level reconstruction and motion estimation for objects, but they often rely on heuristics regarding the number of moving parts or object categories, which can limit their practical use. In this work, we introduce LEIA, a novel approach for representing dynamic 3D objects. Our method involves observing the object at distinct time steps or "states" and conditioning a hypernetwork on the current state, using this to parameterize our NeRF. This approach allows us to learn a view-invariant latent representation for each state. We further demonstrate that by interpolating between these states, we can generate novel articulation configurations in 3D space that were previously unseen. Our experimental results highlight the effectiveness of our method in articulating objects in a manner that is independent of the viewing angle and joint configuration. Notably, our approach outperforms previous methods that rely on motion information for articulation registration.

How it works

Overview of our method. We take multi-view images in different states as input. A learnable latent dictionary based off an autoencoder learns an embedding per state id. The latent embedding is used as an input to the hypernet, that modulates and generates weights of the NeRF to reconstruct the state that is fed in. At inference time, we do a weighed interpolation of the learnt latents to obtain a corresponding newly generated intermediate state.

Results

We evaluate our method on both synthetic and real-world data. We show that our method can generate novel states for articulated objects that were not seen during training. We demonstrate the effectiveness of our method in capturing complex articulations and show that it outperforms previous methods that rely on motion information for articulation registration. We also show that our method is robust to single and multiple articulations, as well as combinations of motions.

BibTeX

@misc{swaminathan2024leialatentviewinvariantembeddings, title={LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation}, author={Archana Swaminathan and Anubhav Gupta and Kamal Gupta and Shishira R. Maiya and Vatsal Agarwal and Abhinav Shrivastava}, year={2024}, eprint={2409.06703}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2409.06703}, }

LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation

LEIA can generate unseen articulated states of objects with multiple moving parts, using only start and end state as input!

Abstract

How it works

Results

Real World Data Results

Synthetic Data Results

washing machine, motion: revolute

dishwasher, motion: prismatic

storage, motion: prismatic

storage, motion: revolute multi part

storage, motion: prismatic multi part

storage, motion: prismatic and revolute multi part

sunglasses, motion: revolute multi part

box, motion: prismatic and revolute multi part

BibTeX