论文成果 / Publications
2025
The Odyssey Journey: Top-Tier Medical Resource Seeking for Specialized Disorder in China
Abstract
It is pivotal for patients to receive accurate health information, diagnoses, and timely treatments. However, in China, the significant imbalanced doctor-to-patient ratio intensifes the information and power asymmetries in doctor-patient relationships. Health information-seeking, which enables patients to collect information
from sources beyond doctors, is a potential approach to mitigate these asymmetries. While HCI research predominantly focuses on common chronic conditions, our study focuses on specialized disorders, which are often familiar to specialists but not to general practitioners and the public. With Hemifacial Spasm (HFS) as an example, we aim to understand patients’ health information and top-tier1 medical resource seeking journeys in China. Through interviews with three neurosurgeons and 12 HFS patients from rural and urban areas, and applying Actor-Network Theory, we provide empirical insights into the roles, interactions, and workfows of various actors in the health information-seeking network. We also identifed fve strategies patients adopted to mitigate asymmetries and access top-tier medical resources, illustrating these strategies as subnetworks within the broader health information-seeking network and outlining their advantages and challenges.
Prompt2Task: Automating UI Tasks on Smartphones from Textual Prompts
Abstract
UI task automation emables efficient task execution by simulating human interactions with graphical user interfaces(GUIs), without modifying the existing application code. However its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present Prompt2Task, a system designed to comprehend various task-related textual prompts (e.g., goals, procedures), thereby generating and performing the corresponding automation tasks. Prompt2Task incorporates a suite of intelligent agents that mimic human cognitive functions, specializing in interpreting
user intent, managing external information for task generation, and executing operations on smartphones. The agents can learn from user feedback and continuously improve their performance based on the accumulated knowledge. Experimental results indicated a performance jump from a 22.28% success rate in the baseline to 95.24% with Prompt2Task, requiring an average of 0.69 user interventions for each new task. Prompt2Task presents promising applications in fields such as tutorial creation, smart assistance, and customer service.
Spiking-PhysFormer: Camera-based remote photoplethysmography with parallel spike-driven transformer
Abstract
Artificial neural networks (ANNs) can help camera-based remote photoplethysmography (rPPG) in measuring cardiac activity and physiological signals from facial videos, such as pulse wave, heart rate and respiration rate with better accuracy. However, most existing ANN-based methods require substantial computing resources, which poses challenges for effective deployment on mobile devices. Spiking neural networks (SNNs), on the other hand, hold immense potential for energy-efficient deep learning owing to their binary and event-driven architecture. To the best of our knowledge, we are the first to introduce SNNs into the realm of rPPG, proposing a hybrid neural network (HNN) model, the Spiking-PhysFormer, aimed at reducing power consumption. Specifically, the proposed Spiking-PhyFormer consists of an ANN-based patch embedding block, SNN-based transformer blocks, and an ANN-based predictor head. First, to simplify the transformer block while preserving its capacity to aggregate local and global spatio-temporal features, we design a parallel spike transformer block to replace sequential sub-blocks. Additionally, we propose a simplified spiking self-attention mechanism that omits the value parameter without compromising the model’s performance. Experiments conducted on four datasets—PURE, UBFC-rPPG, UBFC-Phys, and MMPD demonstrate that the proposed model achieves a 10.1% reduction in power consumption compared to PhysFormer. Additionally, the power consumption of the transformer block is reduced by a factor of 12.2, while maintaining decent performance as PhysFormer and other ANN based models.
A Comparison Study Understanding the Impact of Mixed Reality Collaboration on Sense of Co-Presence
Abstract
Sense of co-presence refers to the perceived closeness and interaction between participants in a collaborative context, which critically impacts the collaboration experience and task performance. With the emergence of Mixed Reality (MR) technologies, we would like to investigate the effect of MR immersive collaboration environment on promoting co-presence in a remote setting by comparing it with non-MR methods, such as video conferencing. We conduct a comparison study, where we invited 14 dyads of participants to collaborate on block assembly tasks with video conferencing, MR system, and in a physically co-located scenario. Each participant of a dyad was assigned either a local worker to assemble the blocks or a remote helper to give the instructions. Results show that MR system can create comparable sense of co-presence with co-located situation, and allow users to interact more naturally with both the environment and each other. The adoption of mixed reality enhances collaboration and task performance by reducing reliance on verbal communication and favoring action-based interactions through gestures and direct manipulation of virtual objects.
Predicting Ray Pointer Landing Poses in VR Using Multimodal LSTM-Based Neural Networks
Abstract
Taget selection is one of the most fundamenal tasks in VR interaction systems. Prediction heuristics can provide users with a smoother interastion eperience in this process. Our work aims to predict the ray landing pose for hand-baed raycasting selection in Virtual Reality (VR) using a Long Short-Tem Memory (LSTM)-based neural network with time-series data input of speed and distance over time from three different pose channels: hand,Head Moumted Display (HMD),and eye. We first conducted a sudy to collect motion data from these three input channels and analyzed these movement behaviors. Additionally, we evaluated which combination of input modalities yields the optimal result. A second study validates raycasting aross a continuous range of distances, angles, and target sizes. On average, our technique’s pedictions were within 4.6° of the true landing Pose when 50% of the way through the movement. We compared our LSTM neural network model to a kinematic infomation model and further validated its generalizability in two ways: by training the model on one user's data and testing on other users (cross-user) and by training on a group of users and testing on entirely new users (unseen users). Compared to the basline and a previous kinematic method, our model inereased prediction acuracy by a factor of 3.5 and 1.9, respectively, when 40% of the way through the movement.
Communications of CCF | 人机协同中的交互式学习
Abstract
当我们将机器比作一位求学的学徒时,它获取基本常识或专业领域知识的主要途径无非是自学和请教老师。近期广受关注的大语言模型(Large Language Model,LLM)基于大规模数据集的预训练过程,如同学徒的“读万卷书”;而基于人类反馈的强化学习(RLHF)¹,则相当于学徒在课堂上向老师请教。最终,学徒毕业进入职场,但在与客户的交互中发现,从书本上和老师那里学到的知识要么不够用,要么不适用,以至于无法理解客户的具体需求和观点。这一例子揭示了以LLM为代表的机器智能在垂直领域中面临的现实瓶颈:在真实应用场景中,机器仍缺乏有效的学习路径来获取垂直领域知识或者个性化知识。那么,能否从以人为核心的人机交互(HCI)视角出发,突破这一瓶颈呢?本文将探讨一种新的机器学习范式——交互式学习,其核心在于通过人机自然交互过程,实现知识从用户到机器的高效传递,从而解决上述问题。我们将从人机协同的角度论证为什么交互式学习是克服现有模型训练局限、迈向通用智能的重要途径。
2024
Automated Grading Hemifacial Spasm Using Smartphone Cameras
Abstract
Hemifacial spasm is a chronic neurological condition characterized by involuntary facial muscle contractions caused by nerve compression. While familiar to specialists, it is less known to the public and general practitioners, which can lead to difficulties in diagnosis and severity assessment, and even misdiagnosis. Consequently, patients are common to have a long medical history. However, long-term patients tend to have poorer outcomes following surgery, and one-third of patients experience a delayed cure during postoperative rehabilitation. Moreover, 4% of patients experience recurrence, highlighting the importance of early and accurate diagnosis as well as postoperative monitoring. In this paper, we collected a video dataset of 50 hemifacial spasm patients and 9 healthy adults. We identified three facial features from the videos to establish a novel grading system closely aligned with the medical standards, specifically the Cohen-Albert Grading System. We also developed algorithms capable of automatically grading hemifacial spasm using smartphone cameras based on facial keypoint detection. These algorithms were evaluated on the dataset, achieving an accuracy of 88% for detection and a mean absolute error of 0.42 for grading.
PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-Based Motion Capture
Abstract
The data scarcity problem is a crucial factor that hampers the model performance of IMU-based human motion capture. However,
effective data augmentation for IMU-based motion capture is challenging, since it has to capture the physical relations and constraints of the human body, while maintaining the data distribution and quality. We propose PoseAugment, a novel pipeline incorporating VAE-based pose generation and physical optimization. Given a pose sequence, the VAE module generates infinite poses with both high fidelity and diversity, while keeping the data distribution. The physical module optimizes poses to satisfy physical constraints with minimal motion restrictions. High-quality IMU data are then synthesized from the augmented poses for
training motion capture models. Experiments show that PoseAugment outperforms previous data augmentation and pose generation methods in terms of motion capture accuracy, revealing a strong potential of our method to alleviate the data collection burden for IMU-based motion capture and related tasks driven by human poses.
DreamCatcher: A Wearer-aware Sleep Event Dataset Based on Earables in Non-restrictive Environments
Abstract
Poor quality sleepcanbe characterized by the occurrence ofevents anging frombody movement to breathing impairment. Widely awailable earbuds equippedwith sensors (ako known as earables)can be combined with a sleep event de-tection algorithm to offer a convenientalemative to laboriousclinical tests forindividuals suffering from sleep disorders. Although warious solutions utilizingsuch devices have been proposed to detectsleepevents, they ignore the fact thatindividuas often share sleeping spaces with roommates or couples. To addressthis issue, we introduce DreamCatcher, the first publicly available dataset forwearer-aware sleep event algorithm development on earables. DreamCatcherencompasses eight distinctsleepevents, including synchronous dual-channel au-dioand motion data collected from 12 pairs (24 participants)toaling 210 hours(420 hourperson)with fine-grained label. We tested multiple benchmark mod-es on three tasks related to sleep event detection, demonstrating the usabilityand unique challenge of DreamCatcher. We hope thatthe proposed Dream-Catchercan inspire other researchers to further explore efficient wearer-awarehuman vocal activitysensing onearables.