跳至内容

2025

AutoPBL: An LLM-powered Platform to Guide and Support Individual Learners Through Self Project-based Learning
(CHI'25) Yihao Zhu, Zhoutong Ye, Yichen Yuan, Wenxuan Tang, Chun Yu, and Yuanchun Shi.
Abstract
Self project-based learning (SPBL) is a popular learning style where learners follow tutorials and build projects by themselves. SPBL combines project-based learning’s benefit of being engaging and effective with the flexibility of self-learning. However, insufficient guidance and support during SPBL may lead to unsatisfactory learning experiences and outcomes. While LLM chatbots (e.g., ChatGPT) could potentially serve as SPBL tutors, we have yet to see an SPBL platform with responsible and systematic LLM integration. To address this gap, we present AutoPBL, an interactive learning platform for SPBL learners. We examined human PBL tutors’ roles through formative interviews to inform our design. AutoPBL features an LLM-guided learning process with checkpoint questions and incontext Q&A. In a user study where 29 beginners learned machine learning through entry-level projects, we found that AutoPBL effectively improves learning outcomes and elicits better learning behavior and metacognition by clarifying current priorities and providing timely assistance.
Unknown Word Detection for English as a Second Language (ESL) Learners using Gaze and Pre-trained Language Models
(CHI'25) Jiexin Ding, Bowen Zhao, Yuntao Wang, Xinyun Liu, Rui Hao, Ishan Chatterjee, and Yuanchun Shi.
Abstract
English as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.
Palmpad: Enabling Real-Time Index-to-Palm Touch Interaction with a Single RGB Camera
(CHI'25) Zhe He, Xiangyang Wang, Yuanchun Shi, Chi Hsia, Chen Liang, and Chun Yu.
Abstract
Index-to-palm interaction plays a crucial role in Mixed Reality(MR) interactions. However, achieving a satisfactory inter-hand interaction experience is challenging with existing vision-based hand tracking technologies, especially in scenarios where only a single camera is available. Therefore, we introduce Palmpad, a novel sensing method utilizing a single RGB camera to detect the touch of an index finger on the opposite palm. Our exploration reveals that the incorporation of optical flow techniques to extract motion information between consecutive frames for the index finger and palm leads to a significant improvement in touch status determination. By doing so, our CNN model achieves 97.0% recognition accuracy and a 96.1% F1 score. In usability evaluation, we compare Palmpad with Quest’s inherent hand gesture algorithms. Palmpad not only delivers superior accuracy 95.3% but also reduces operational demands and significantly improves users’ willingness and confidence. Palmpad aims to enhance accurate touch detection for lightweight MR devices.
WritingRing: Enabling Natural Handwriting Input with a Single IMU Ring
(CHI'25) Zhe He, Zixuan Wang, Chun Yu, Chengwen Zhang, Xiyuan Shen, and Yuanchun Shi.
Abstract
Tracking continuous 2D sequential handwriting trajectories accurately using a single IMU ring is extremely challenging due to the significant displacement between the IMU’s wearing position and the location of the tracked fingertip. We propose WritingRing, a system that uses a single IMU ring worn at the base of the finger to support natural handwriting input and provide real-time 2D trajectories. To achieve this, we first built a handwriting dataset using a touchpad and an IMU ring (N=20). Next, we improved the LSTM model by incorporating streaming input and a TCN network, significantly enhancing accuracy and computational efficiency, and achieving an average trajectory accuracy of 1.63mm. Real-time usability studies demonstrated that the system achieved 88.7% letter recognition accuracy and 68.2% word recognition accuracy, which reached 84.36% when restricting the output to words within a vocabulary of size 3000. WritingRing can also be embedded into existing ring systems, providing a natural and real-time solution for various applications.
Investigating Context-Aware Collaborative Text Entry on Smartphones using Large Language Models
(CHI'25)Weihao Chen, Yuanchun Shi, Yukun Wang, Weinan Shi, Meizhu Chen, Cheng Gao, Yu Mei, Yeshuang Zhu, Jinchao Zhang, and Chun Yu.
Abstract
Text entry is a fundamental and ubiquitous task, but users often face challenges such as situational impairments or difficulties in sentence formulation. Motivated by this, we explore the potential of large language models (LLMs) to assist with text entry in real-world contexts. We propose a collaborative smartphone-based text entry system, CATIA, that leverages LLMs to provide text suggestions based on contextual factors, including screen content, time, location, activity, and more. In a 7-day in-the-wild study with 36 participants, the system offered appropriate text suggestions in over 80% of cases. Users exhibited different collaborative behaviors depending on whether they were composing text for interpersonal communication or information services. Additionally, the relevance of contextual factors beyond screen content varied across scenarios. We identified two distinct mental models: AI as a supportive facilitator or as a more equal collaborator. These findings outline the design space for human-AI collaborative text entry on smartphones.
Enhancing Smartphone Eye Tracking with Cursor-Based Interactive Implicit Calibration
(CHI'25) Chang Liu, Xiangyang Wang, Chun Yu, Yingtian Shi, Chongyang Wang, Ziqi Liu, Chen Liang, and Yuanchun Shi.
Abstract
The limited accuracy of eye-tracking on smartphones restricts its use. Existing RGB-camera-based eye-tracking relies on extensive datasets, which could be enhanced by continuous fine-tuning using calibration data implicitly collected from the interaction. In this context, we propose COMETIC (Cursor Operation Mediated Eye-Tracking Implicit Calibration), which introduces a cursor-based interaction and utilizes the inherent correlation between cursor and eye movement. By filtering valid cursor coordinates as proxies for the ground truth of gaze and fine-tuning the eye-tracking model with corresponding images, COMETIC enhances accuracy during the interaction. Both filtering and fine-tuning use pre-trained models and could be facilitated using personalized, dynamically updated data. Results show COMETIC achieves an average eye-tracking error of 278.3 px (1.60 cm, 2.29◦), representing a 27.2% improvement compared to that without fine-tuning. We found that filtering cursor points whose actual distance to gaze is 150.0 px (0.86 cm) yields the best eye-tracking results.
Modeling the Impact of Visual Stimuli on Redirection Noticeability with Gaze Behavior in Virtual Reality
(CHI'25) Zhipeng Li, Yishu Ji, Ruijia Chen, Tianqi Liu, Yuntao Wang, Yuanchun Shi, and Yukang Yan.
Abstract
While users could embody virtual avatars that mirror theirphysical movements in Virtual Realty, these avatars'motions can be redirected to enable novel interactions. Excessive redirection,however,coud break the user's sense of embodiment due to pereptual con-flictsbetween vision and proprioception While prior work focused on avatar-related factors influencing the noticeability of redirection, we investigate how the visual stimuli in the surounading virtual environment affect user behavior and, in turn, the noticeability of redirection. Given the wide variety of ifferent types of vsual stimuli and their tendency to ehicit varying individual reactions, we propose to ue uers'gaze behavior as an indicator of the irrespanse to the stimuli and model thenoticeability of redirecfon.We conducted two user studies to collect users'gaze behavior and noticeability, invesigating the reationship between them and identifying the mast efective gaze behavior features for predicting noticeability.Based on the data,we developed a regre sion model that takes tusersgaze behavior as input and outputs the noticeability of redirection.We then conducted anevahuation study to test our model on u-seen visualstimuli,achieving an acuracyof 0.012 MSE.We further implemented am adaptive redirection technique and conducted apreliminary shudy to evaluate its effectiveness with complex visual stimalin two appications.The results indicated that participants experienced less physical demanding and a stronger sense of body ownership when using our adaptive technique,demonstrating the potential of our model to support real-world use cases.
From Operation to Cognition: Automatic Modeling Cognitive Dependencies from User Demonstrations for GUI Task
(CHI'25)Yiwen Yin, Yu Mei, Chun Yu, Toby Jia-Jun Li, Aamir Khan Jadoon, Sixiang Cheng, Weinan Shi, Mohan Chen, and Yuanchun Shi.
Abstract
TraditioralProgrnmming by Demonstation(PBD)systems pri-marily automate tasks by recording and replaying operations onGraphical User Interfaces(GUIs),without flly considering thecognitive praceses behind operat ions.This limits their ablity togeneralize tasks with interdependent operations to new contexts(eg-cmlerting and smm arizing introductions depemding on diferentsaurch keywuorads frum wuried websies)We propose TaskMind,asystem that zanutomaticaly identifies the semantics of operations,andthe cognitive dependencies between cperations from demonstra-tions,building a user-interpretable taskgraph Users modify thisgraph to define new taskgoals,and TaskMind esecutes the graphto dynamically generalize new parameters for operations,withthe integration ofLarge Language Models(LLMs. We comparedTaskMind with a baseline end-to-end LLM which automates tasksfrom demonst raions and natral hanguage commands withouttask gaph In studies with 20 mrticipants on both predefined andcustomized tasks,TaskMind significantly outperforms the baselinein both succes rate and controlablity.
The Odyssey Journey: Top-Tier Medical Resource Seeking for Specialized Disorder in China
(CHI‘25)Ka I Chan, Siying Hu, Yuntao Wang, Xuhai Xu, Zhicong Lu, and Yuanchun Shi.
Abstract
It is pivotal for patients to receive accurate health information, diagnoses, and timely treatments. However, in China, the significant imbalanced doctor-to-patient ratio intensifes the information and power asymmetries in doctor-patient relationships. Health information-seeking, which enables patients to collect information from sources beyond doctors, is a potential approach to mitigate these asymmetries. While HCI research predominantly focuses on common chronic conditions, our study focuses on specialized disorders, which are often familiar to specialists but not to general practitioners and the public. With Hemifacial Spasm (HFS) as an example, we aim to understand patients’ health information and top-tier1 medical resource seeking journeys in China. Through interviews with three neurosurgeons and 12 HFS patients from rural and urban areas, and applying Actor-Network Theory, we provide empirical insights into the roles, interactions, and workfows of various actors in the health information-seeking network. We also identifed fve strategies patients adopted to mitigate asymmetries and access top-tier medical resources, illustrating these strategies as subnetworks within the broader health information-seeking network and outlining their advantages and challenges.