论文成果 / Publications
2025
Computing with Smart Rings: A Systematic Literature Review
Abstract
A smart ring is a wearable electronic device in the form of a ring that incorporates diverse sensors and computing technologies to perform a variety of functions. Designed for use with fingers, smart rings are capable of sensing more subtle and abundant hand movements, thus making them a good platform for interaction. Meanwhile, fingers are abundant with blood vessels and nerve endings and accustomed to wearing rings, providing an ideal site for continuous health monitoring through smart rings, which combine comfort with the ability to capture vital biometric data, making them suitable for all-day wear. We collected in total of 206 smart ring-related publications and conducted a systematic literature review. We provide a taxonomy
regarding the sensing and feedback modalities, applications, and phenomena. We review and categorize these literatures into four main areas: (1) interaction - input, (2) interaction - output, (3) passive sensing - in body feature, (4) passive sensing - out body activity. This comprehensive review highlights the current advancements within the field of smart ring and identifies potential areas for future research.
AutoPBL: An LLM-powered Platform to Guide and Support Individual Learners Through Self Project-based Learning
Abstract
Self project-based learning (SPBL) is a popular learning style where learners follow tutorials and build projects by themselves. SPBL combines project-based learning’s benefit of being engaging and
effective with the flexibility of self-learning. However, insufficient guidance and support during SPBL may lead to unsatisfactory learning experiences and outcomes. While LLM chatbots (e.g., ChatGPT)
could potentially serve as SPBL tutors, we have yet to see an SPBL platform with responsible and systematic LLM integration. To address this gap, we present AutoPBL, an interactive learning platform for SPBL learners. We examined human PBL tutors’ roles through formative interviews to inform our design. AutoPBL features an LLM-guided learning process with checkpoint questions and incontext Q&A. In a user study where 29 beginners learned machine learning through entry-level projects, we found that AutoPBL effectively improves learning outcomes and elicits better learning behavior and metacognition by clarifying current priorities and providing timely assistance.
Unknown Word Detection for English as a Second Language (ESL) Learners using Gaze and Pre-trained Language Models
Abstract
English as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems
to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.
Palmpad: Enabling Real-Time Index-to-Palm Touch Interaction with a Single RGB Camera
Abstract
Index-to-palm interaction plays a crucial role in Mixed Reality(MR) interactions. However, achieving a satisfactory inter-hand interaction experience is challenging with existing vision-based hand tracking technologies, especially in scenarios where only a single camera is available. Therefore, we introduce Palmpad, a novel sensing method utilizing a single RGB camera to detect the touch of an index finger on the opposite palm. Our exploration reveals that the incorporation of optical flow techniques to extract motion information between consecutive frames for the index finger and palm leads to a significant improvement in touch status determination. By doing so, our CNN model achieves 97.0% recognition accuracy and a 96.1% F1 score. In usability evaluation, we compare Palmpad with Quest’s inherent hand gesture algorithms. Palmpad not only delivers superior accuracy 95.3% but also reduces operational demands and significantly improves users’ willingness and confidence. Palmpad aims to enhance accurate touch detection for
lightweight MR devices.
WritingRing: Enabling Natural Handwriting Input with a Single IMU Ring
Abstract
Tracking continuous 2D sequential handwriting trajectories accurately using a single IMU ring is extremely challenging due to the significant displacement between the IMU’s wearing position and the location of the tracked fingertip. We propose WritingRing, a system that uses a single IMU ring worn at the base of the finger to support natural handwriting input and provide real-time 2D trajectories. To achieve this, we first built a handwriting dataset using a touchpad and an IMU ring (N=20). Next, we improved the LSTM model by incorporating streaming input and a TCN network, significantly enhancing accuracy and computational efficiency, and achieving an average trajectory accuracy of 1.63mm. Real-time usability studies demonstrated that the system achieved 88.7% letter recognition accuracy and 68.2% word recognition accuracy, which reached 84.36% when restricting the output to words within a vocabulary of size 3000. WritingRing can also be embedded into existing ring systems, providing a natural and real-time solution for various applications.
Investigating Context-Aware Collaborative Text Entry on Smartphones using Large Language Models
Abstract
Text entry is a fundamental and ubiquitous task, but users often face challenges such as situational impairments or difficulties in sentence formulation. Motivated by this, we explore the potential of large language models (LLMs) to assist with text entry in real-world contexts. We propose a collaborative smartphone-based text entry system, CATIA, that leverages LLMs to provide text suggestions based on contextual factors, including screen content, time, location, activity, and more. In a 7-day in-the-wild study with 36 participants, the system offered appropriate text suggestions in over 80% of cases. Users exhibited different collaborative behaviors depending on whether they were composing text for interpersonal
communication or information services. Additionally, the relevance of contextual factors beyond screen content varied across scenarios. We identified two distinct mental models: AI as a supportive facilitator or as a more equal collaborator. These findings outline the design space for human-AI collaborative text entry on smartphones.
Enhancing Smartphone Eye Tracking with Cursor-Based Interactive Implicit Calibration
Abstract
The limited accuracy of eye-tracking on smartphones restricts its use. Existing RGB-camera-based eye-tracking relies on extensive datasets, which could be enhanced by continuous fine-tuning using
calibration data implicitly collected from the interaction. In this context, we propose COMETIC (Cursor Operation Mediated Eye-Tracking Implicit Calibration), which introduces a cursor-based interaction and utilizes the inherent correlation between cursor and eye movement. By filtering valid cursor coordinates as proxies for the ground truth of gaze and fine-tuning the eye-tracking model with corresponding images, COMETIC enhances accuracy during the interaction. Both filtering and fine-tuning use pre-trained models and could be facilitated using personalized, dynamically updated data. Results show COMETIC achieves an average eye-tracking error of 278.3 px (1.60 cm, 2.29◦), representing a 27.2% improvement compared to that without fine-tuning. We found that filtering cursor points whose actual distance to gaze is 150.0 px (0.86 cm) yields the best eye-tracking results.
Modeling the Impact of Visual Stimuli on Redirection Noticeability with Gaze Behavior in Virtual Reality
Abstract
While users could embody virtual avatars that mirror theirphysical movements in Virtual Realty, these avatars'motions can be redirected to enable novel interactions. Excessive redirection,however,coud break the user's sense of embodiment due to pereptual con-flictsbetween vision and proprioception While prior work focused on avatar-related factors influencing the noticeability of redirection, we investigate how the visual stimuli in the surounading virtual environment affect user behavior and, in turn, the noticeability of redirection. Given the wide variety of ifferent types of vsual stimuli and their tendency to ehicit varying individual reactions, we propose to ue uers'gaze behavior as an indicator of the irrespanse to the stimuli and model thenoticeability of redirecfon.We conducted two user studies to collect users'gaze behavior and noticeability, invesigating the reationship between them and identifying the mast efective gaze behavior features for predicting noticeability.Based on the data,we developed a regre sion model that takes tusersgaze behavior as input and outputs the noticeability of redirection.We then conducted anevahuation study to test our model on u-seen visualstimuli,achieving an acuracyof 0.012 MSE.We further implemented am adaptive redirection technique and conducted apreliminary shudy to evaluate its effectiveness with complex visual stimalin two appications.The results indicated that participants experienced less physical demanding and a stronger sense of body ownership when using our adaptive technique,demonstrating the potential of our model to support real-world use cases.
From Operation to Cognition: Automatic Modeling Cognitive Dependencies from User Demonstrations for GUI Task
Abstract
TraditioralProgrnmming by Demonstation(PBD)systems pri-marily automate tasks by recording and replaying operations onGraphical User Interfaces(GUIs),without flly considering thecognitive praceses behind operat ions.This limits their ablity togeneralize tasks with interdependent operations to new contexts(eg-cmlerting and smm arizing introductions depemding on diferentsaurch keywuorads frum wuried websies)We propose TaskMind,asystem that zanutomaticaly identifies the semantics of operations,andthe cognitive dependencies between cperations from demonstra-tions,building a user-interpretable taskgraph Users modify thisgraph to define new taskgoals,and TaskMind esecutes the graphto dynamically generalize new parameters for operations,withthe integration ofLarge Language Models(LLMs. We comparedTaskMind with a baseline end-to-end LLM which automates tasksfrom demonst raions and natral hanguage commands withouttask gaph In studies with 20 mrticipants on both predefined andcustomized tasks,TaskMind significantly outperforms the baselinein both succes rate and controlablity.