Knowledge-Infused Retrieval Boosts Few-Shot Hand Gesture Recognition on HaGRID with Vision-Language Model

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Hand gesture recognition remains challenging, primarily due to its reliance on large-scale annotated datasets and the limited adaptability of existing models when encountering novel gesture classes. In this work, we propose to apply Adaptive Vision-Language Model (Adaptive-VLM). This lightweight, training-free framework utilizes only one image per class to recognize gestures on the HaGRID benchmark. Built upon the CLIP backbone, our approach incorporates symbolic knowledge-infused prompts, multi-prompt contextualization, and semantic exemplar ranking to improve few-shot generalization. Adaptive-VLM achieves a macro F1-score of 65.75% on the HaGRID test set (540 images) without any parameter fine-tuning, using 18 example images. It significantly outperforms the Random-VLM baseline (59.95%) and a ResNet-18 model fine-tuned for 10 epochs (4.09%) under the same data constraints. These findings highlight the effectiveness of combining structured domain knowledge and guided exemplar selection to overcome data scarcity in low-resource gesture recognition. Adaptive-VLM offers a promising direction for building adaptive and efficient HGR systems, especially in real-world human-computer interaction scenarios requiring rapid deployment with minimal data.

Original languageEnglish
Title of host publicationNew Trends in Intelligent Software Methodologies, Tools and Techniques - Proceedings of the 24th International Conference on New Trends in Intelligent Software Methodologies, Tools and Techniques, SoMeT 2025
EditorsHamido Fujita, Andres Hernandez-Matamoros, Yutaka Watanobe
PublisherIOS Press BV
Pages396-413
Number of pages18
ISBN (Electronic)9781643686196
DOIs
Publication statusPublished - 16 Sept 2025
Event24th International Conference on New Trends in Intelligent Software Methodologies, Tools and Techniques, SoMeT 2025 - Kitakyushu, Japan
Duration: 23 Sept 202526 Sept 2025

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume411
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference24th International Conference on New Trends in Intelligent Software Methodologies, Tools and Techniques, SoMeT 2025
Country/TerritoryJapan
CityKitakyushu
Period23/09/2526/09/25

Keywords

  • Adaptive Learning
  • Few-Shot Learning
  • HaGRID Dataset
  • Hand Gesture Recognition
  • Human-Computer Interaction
  • Knowledge Injection
  • Prompt Engineering
  • Vision-Language Models

Fingerprint

Dive into the research topics of 'Knowledge-Infused Retrieval Boosts Few-Shot Hand Gesture Recognition on HaGRID with Vision-Language Model'. Together they form a unique fingerprint.

Cite this