论文成果 / Publications
2022
GazeDock: Gaze-Only Menu Selection in Virtual Reality using Auto-Triggering Peripheral Menu
Abstract
Gaze-only input techniques in VR face the challenge of avoiding false triggering due to continuous eye tracking while maintaining interaction performance. In this paper, we proposed GazeDock, a technique for enabling fast and robust gaze-based menu selection in VR. GazeDock features a view-fixed peripheral menu layout that automatically triggers appearing and selection when the user's gaze approaches and leaves the menu zone, thus facilitating interaction speed and minimizing the false triggering rate. We built a dataset of 12 participants'natural gaze movements in typical VR applications. By analyzing their gaze movement patterns, we designed the menu UI personalization and optimized selection detection algorithm of GazeDock. We also examined users' gaze selection precision for targets on the peripheral menu and found that 4-8 menu items yield the highest throughput when considering both speed and accuracy. Finally, we validated the usability of GazeDock in a VR navigation game that contains both scene exploration and menu selection. Results showed that GazeDock achieved an average selection time of 471ms and a false triggering rate of 3.6%. And it received higher user preference ratings compared with dwell-based and pursuit-based techniques.
中国图象图形学报 | 多模态人机交互综述
Abstract
多模态人机交互旨在利用语音、图像、文本、眼动和触觉等多模态信息进行人与计算机之间的信息交换。在生理心理评估、办公教育、军事仿真和医疗康复等领域具有十分广阔的应用前景。本文系统地综述了多模态人机交互的发展现状和新兴方向,深入梳理了大数据可视化交互、基于声场感知的交互、混合现实实物交互、可穿戴交互和人机对话交互的研究进展以及国内外研究进展比较。本文认为拓展新的交互方式、设计高效的各模态交互组合、构建小型化交互设备、跨设备分布式交互、提升开放环境下交互算法的鲁棒性等是多模态人机交互的未来研究趋势。
MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing
Abstract
Camera-based contactless photoplethysmography refers to a set of popular techniques for contactless physiological measurement. The current state-of-the-art neural models are typically trained in a supervised manner using videos accompanied by gold standard physiological measurements. However, they often generalize poorly out-of-domain examples (i.e., videos that are unlike those in the training set). Personalizing models can help improve model generalizability, but many personalization techniques still require some gold standard data. To help alleviate this dependency, in this paper, we present a novel mobile sensing system called MobilePhys, the first mobile personalized remote physiological sensing system, that leverages both front and rear cameras on a smartphone to generate high-quality self-supervised labels for training personalized contactless camera-based PPG models. To evaluate the robustness of MobilePhys, we conducted a user study with 39 participants who completed a set of tasks under different mobile devices, lighting conditions/intensities, motion tasks, and skin types. Our
results show that MobilePhys significantly outperforms the state-of-the-art on-device supervised training and few-shot adaptation methods. Through extensive user studies, we further examine how does MobilePhys perform in complex real-world settings. We envision that calibrated or personalized camera-based contactless PPG models generated from our proposed dual-camera mobile sensing system will open the door for numerous future applications such as smart mirrors, fitness and mobile health applications.
Automatically Generating and Improving Voice Command Interface from Operation Sequences on Smartphones
Abstract
Using voice commands to automate smartphone tasks (e.g., making a video call) can effectively augment the interactivity of numerous mobile apps. However, creating voice command interfaces requires a tremendous amount of effort in labeling and compiling the graphical user interface (GUI) and the utterance data. In this paper, we propose AutoVCI, a novel approach to automatically generate voice command interface (VCI) from smartphone operation sequences. The generated voice command interface has two distinct features. First, it automatically maps a voice command to GUI operations and fills in parameters accordingly, leveraging the GUI data instead of corpus or hand-written rules. Second, it launches a complementary Q&A dialogue to confirm the intention in case of ambiguity. In addition, the generated voice command interface can learn and evolve from user interactions. It accumulates the history command understanding results to annotate the user’s input and improve its semantic understanding ability. We implemented this approach on Android devices and conducted a two-phase user study with 16 and 67 participants in each phase. Experimental results of the study demonstrated the practical feasibility of AutoVCI.
FaceOri: Tracking Head Position and Orientation Using Ultrasonic Ranging on Earphones
Abstract
Face orientation can often indicate users’ intended interaction target. In this paper, we propose FaceOri, a novel face tracking technique based on acoustic ranging using earphones. FaceOri can leverage the speaker on a commodity device to emit an ultrasonic chirp, which is picked up by the set of microphones on the user’s earphone, and then processed to calculate the distance from each microphone to the device. These measurements are used to derive the user’s face orientation and distance with respect to the device. We conduct a ground truth comparison and user study to evaluate FaceOri’s performance. The results show that the system can determine whether the user orients to the device at a 93.5% accuracy within a 1.5 meters range. Furthermore, FaceOri can continuously track user’s head orientation with a median absolute error of 10.9 mm in the distance, 3.7◦ in yaw, and 5.8◦ in pitch. FaceOri can allow for convenient hands-free control of devices and produce more intelligent context-aware interactions.
From 2D to 3D: Facilitating Single-Finger Mid-Air Typing on Virtual Keyboards with Probabilistic Touch Modeling
Abstract
Mid-air text entry on virtual keyboards suffers from the lack of tactile feedback, bringing challenges to both tap detection and input prediction. In this poster, we demonstrated the feasibility of efficient single-finger typing in mid-air through probabilistic touch modeling. We first collected users’ typing data on different sizes of virtual keyboards. Based on analyzing the data, we derived an input prediction algorithm that incorporated probabilistic touch detection and elastic probabilistic decoding. In the evaluation study where the participants performed real text entry tasks with this technique,they reached a pick-up single-finger typing speed of 24.0 WPM with 2.8% word-level error rate.
Communications of CCF | 做好人机交互研究
Abstract
人机交互(Human Computer Interactioan,HCI)研究人与计算机系统之间自然高效信息交换的原理与技术,实现为由多科模态的输入输出软硬件接口所构成的用户终端界面,形成特定的交互模式。如图1所示,接口分为用户输入数据处理的输入接口和机器处理结果反饿的输出接口。人的交互意图在脑中产生,今天的生命科学和脑电技术尚不能实现直接读脑写脑(图1中表示为虚线),交互意图需要通过外周神经系统下的行为动作表达出来,可以是操控工具,也可以是语音和动作的自然表达.输入接口的主要任务是捕捉和处理人的外在行为;机器处理结果的呈现要符合人的感知认知特点。
CAAI Communications|元宇宙需要人机交互的突破
Abstract
元宇宙目标实现万物的信息化和智能化,创造一个信息充分包围人的虚实融合空间,演化生成时空无界的新型社会形态。人机交互是元宇宙的核心关键技术,人机接口的扩展和虚拟化,实现人机之间高效交换语义信息技术挑战大。掌握人机交互科技优势,对推动相关产业发展有着至关重要的作用。本文分析元宇宙人机交互的挑战,重点探讨交互意图推理的突破思路与最新进展。
Easily‑add battery‑free wireless sensors to everyday objects: system implementation and usability study
Abstract
The trend of IoT brings more and more connected smart devices into our daily lives, which can enable a ubiquitous sensing and interaction experience. However, augmenting many everyday objects with sensing abilities is not easy. BitID is an unobtrusive, low-cost, training-free, and easy-to-use technique that enables users to add sensing abilities to everyday objects in a DIY manner. A BitID sensor can be easily made from a UHF RFID tag and deployed on an object so that the tag’s readability (whether the tag is identifed by RFID readers) is mapped to binary states of the object (e.g., whether a door is open or closed). To further validate BitID’s sensing performance, we use a robotic arm to press BitID buttons repetitively and swipe on BitID sliders. The average press recognition F1-score is 98.9% and the swipe recognition F1-score is 96.7%. To evaluate BitID’s usability, we implement a prototype system that supports BitID sensor registration, semantic defnition,
status display, and real-time state and event detection. Using the system, users confgured and deployed a BitID sensor with an average time duration of 4.9 min. 23 of the 24 users deployed BitID sensors worked accurately and robustly. In addition to the previously proposed ’short’ BitID sensor, we propose new ’open’ BitID sensors which show similar performance as ’short’ sensors.