Kun Yan

G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios

Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze --- a modality deeply linked to user …

zeyu-wang

Voila-A: aligning vision-language models with user's gaze attention

In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models …

kun-yan