Voila-A: Aligning Vision-Language Models with User' s Gaze Attention

出版物
Advances in Neural Information Processing Systems