Making Artificial Intelligence systems robustly perceive humans remains one of the most intricate challenges in computer vision. Among the most complex problems is reconstructing 3D models of human hands, a task with wide-ranging applications in robotics, animation, human-computer interaction, and augmented and virtual reality. The difficulty lies in the nature of hands themselves, often obscured while holding objects or contorted into challenging orientations during tasks like grasping.
At Carnegie Mellon University’s Robotics Institute, we designed a new model, Hamba, which was presented at the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) in Vancouver. Hamba offers a particularly interesting approach to reconstructing 3D hands from a single image, requiring no prior knowledge of the camera’s specifications or the context of the person’s body.
What sets Hamba apart is its departure from conventional transformer-based architectures. Instead, it leverages Mamba-based state space modeling, marking the first time such an approach has been applied to articulated 3D shape reconstruction. The model also refines Mamba’s original scanning process by introducing a graph-guided bidirectional scan, which utilizes the graph learning capabilities of Graph Neural Networks to capture spatial relationships between hand joints with remarkable precision.
Hamba achieves state-of-the-art performance on benchmarks like FreiHAND, with a mean per-vertex positional error of just 5.3 millimeters—a precision that underscores its potential for real-world applications. Furthermore, at the time of the study’s acceptance, Hamba holds the top position—Rank 1—on two competition leaderboards for 3D hand reconstruction.
Beyond its technical achievements, Hamba has broader implications for human-computer interaction. By enabling machines to better perceive and interpret human hands, it lays the groundwork for future Artificial General Intelligence (AGI) systems and robots capable of understanding human emotions and intentions with greater nuance.
Looking ahead, the research team plans to address the model’s limitations while exploring its potential to reconstruct full-body 3D human models from single images—another important challenge with wide applications in industries ranging from health care to entertainment. With its unique combination of technical precision and practical utility, Hamba exemplifies how artificial intelligence continues to push the boundaries of how machines can perceive humans.
This story is part of Science X Dialog, where researchers can report findings from their published research articles. Visit this page for information about Science X Dialog and how to participate.
More information:
Haoye Dong, Aviral Chharia, Wenbo Gou, Francisco Vicente Carrasco, Fernando De la Torre, “Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba.” openreview.net/forum?id=pCJ0l1JVUX. On arXiv: DOI: 10.48550/arxiv.2407.09646
Aviral Chharia is a graduate student at Carnegie Mellon University. He has been awarded the ATK-Nick G. Vlahakis Graduate Fellowship at CMU, the Students’ Undergraduate Research Graduate Excellence (SURGE) fellowship at IIT Kanpur, India, and the MITACS Globalink Research Fellowship at the University of British Columbia. Additionally, he was a two-time recipient of the Dean’s List Scholarship during his undergraduate. His research interests include computer vision, computer graphics, and machine learning.
Citation:
Transforming how AI systems perceive human hands (2025, January 17)
retrieved 17 January 2025
from https://techxplore.com/news/2025-01-ai-human.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.