Transforming how AI systems perceive human hands

Making Artificial Intelligence systems robustly perceive humans remains one of the most intricate challenges in computer vision. Among the most complex problems is reconstructing 3D models of human hands, a task with wide-ranging applications in robotics, animation, human-computer interaction, and augmented and virtual reality. The difficulty lies in the nature of hands themselves, often obscured while holding objects or contorted into challenging orientations during tasks like grasping.

At Carnegie Mellon University’s Robotics Institute, we designed a new model, Hamba, which was presented at the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) in Vancouver. Hamba offers a particularly interesting approach to reconstructing 3D hands from a single image, requiring no prior knowledge of the camera’s specifications or the context of the person’s body.

What sets Hamba apart is its departure from conventional transformer-based architectures. Instead, it leverages Mamba-based state space modeling, marking the first time such an approach has been applied to articulated 3D shape reconstruction. The model also refines Mamba’s original scanning process by introducing a graph-guided bidirectional scan, which utilizes the graph learning capabilities of Graph Neural Networks to capture spatial relationships between hand joints with remarkable precision.

Hamba achieves state-of-the-art performance on benchmarks like FreiHAND, with a mean per-vertex positional error of just 5.3 millimeters—a precision that underscores its potential for real-world applications. Furthermore, at the time of the study’s acceptance, Hamba holds the top position—Rank 1—on two competition leaderboards for 3D hand reconstruction.

Beyond its technical achievements, Hamba has broader implications for human-computer interaction. By enabling machines to better perceive and interpret human hands, it lays the groundwork for future Artificial General Intelligence (AGI) systems and robots capable of understanding human emotions and intentions with greater nuance.

Hamba achieves significant performance in various in-the-wild scenarios, including hand interaction with objects or hands, different skin tones, different angles, challenging paintings, and vivid animations. Credit: Authors
Visual comparisons of different scanning flows. (a) Attention methods compute the correlation across all patches leading to a very high number of tokens. (b) Bidirectional scans follow two paths, resulting in less complexity. (c) The proposed graph-guided bidirectional scan (GBS) achieves effective state space modeling leveraging graph learning with a few effective tokens (illustrated as scanning by two snakes: forward and backward scanning snakes). Credit: Authors
Visual Results of Hamba for Full Body Human Reconstruction. Credit: Authors

Looking ahead, the research team plans to address the model’s limitations while exploring its potential to reconstruct full-body 3D human models from single images—another important challenge with wide applications in industries ranging from health care to entertainment. With its unique combination of technical precision and practical utility, Hamba exemplifies how artificial intelligence continues to push the boundaries of how machines can perceive humans.

This story is part of Science X Dialog, where researchers can report findings from their published research articles. Visit this page for information about Science X Dialog and how to participate.

More information:
Haoye Dong, Aviral Chharia, Wenbo Gou, Francisco Vicente Carrasco, Fernando De la Torre, “Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba.” openreview.net/forum?id=pCJ0l1JVUX. On arXiv: DOI: 10.48550/arxiv.2407.09646

Journal information:
arXiv

Aviral Chharia is a graduate student at Carnegie Mellon University. He has been awarded the ATK-Nick G. Vlahakis Graduate Fellowship at CMU, the Students’ Undergraduate Research Graduate Excellence (SURGE) fellowship at IIT Kanpur, India, and the MITACS Globalink Research Fellowship at the University of British Columbia. Additionally, he was a two-time recipient of the Dean’s List Scholarship during his undergraduate. His research interests include computer vision, computer graphics, and machine learning.

Citation:
Transforming how AI systems perceive human hands (2025, January 17)
retrieved 17 January 2025
from https://techxplore.com/news/2025-01-ai-human.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Related Posts

Neuromorphic system enhances machine vision in extreme lighting environments

Machine learning technique predicts likely accounting fraud across supply chains

Attack simulator reveals oversight in AI image recognition tools and mitigation for cyber threat

Leave a Reply Cancel reply