跳至内容

2022

中国图象图形学报 | 多模态人机交互综述
陶建华,巫英才,喻纯,翁冬冬,李冠君,韩腾,王运涛,刘斌
Abstract
多模态人机交互旨在利用语音、图像、文本、眼动和触觉等多模态信息进行人与计算机之间的信息交换。在生理心理评估、办公教育、军事仿真和医疗康复等领域具有十分广阔的应用前景。本文系统地综述了多模态人机交互的发展现状和新兴方向,深入梳理了大数据可视化交互、基于声场感知的交互、混合现实实物交互、可穿戴交互和人机对话交互的研究进展以及国内外研究进展比较。本文认为拓展新的交互方式、设计高效的各模态交互组合、构建小型化交互设备、跨设备分布式交互、提升开放环境下交互算法的鲁棒性等是多模态人机交互的未来研究趋势。
MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing
(IMWUT ’22) Xin Liu, Yuntao Wang, Sinan Xie, Xiaoyu Zhang, Zixian Ma, Daniel McDuff, Shwetak Patel
Abstract
Camera-based contactless photoplethysmography refers to a set of popular techniques for contactless physiological measurement. The current state-of-the-art neural models are typically trained in a supervised manner using videos accompanied by gold standard physiological measurements. However, they often generalize poorly out-of-domain examples (i.e., videos that are unlike those in the training set). Personalizing models can help improve model generalizability, but many personalization techniques still require some gold standard data. To help alleviate this dependency, in this paper, we present a novel mobile sensing system called MobilePhys, the first mobile personalized remote physiological sensing system, that leverages both front and rear cameras on a smartphone to generate high-quality self-supervised labels for training personalized contactless camera-based PPG models. To evaluate the robustness of MobilePhys, we conducted a user study with 39 participants who completed a set of tasks under different mobile devices, lighting conditions/intensities, motion tasks, and skin types. Our results show that MobilePhys significantly outperforms the state-of-the-art on-device supervised training and few-shot adaptation methods. Through extensive user studies, we further examine how does MobilePhys perform in complex real-world settings. We envision that calibrated or personalized camera-based contactless PPG models generated from our proposed dual-camera mobile sensing system will open the door for numerous future applications such as smart mirrors, fitness and mobile health applications.
Automatically Generating and Improving Voice Command Interface from Operation Sequences on Smartphones
(CHI '22) Lihang Pan, Chun Yu*, JiaHui Li, Tian Huang, Xiaojun Bi, Yuanchun Shi
Abstract
Using voice commands to automate smartphone tasks (e.g., making a video call) can effectively augment the interactivity of numerous mobile apps. However, creating voice command interfaces requires a tremendous amount of effort in labeling and compiling the graphical user interface (GUI) and the utterance data. In this paper, we propose AutoVCI, a novel approach to automatically generate voice command interface (VCI) from smartphone operation sequences. The generated voice command interface has two distinct features. First, it automatically maps a voice command to GUI operations and fills in parameters accordingly, leveraging the GUI data instead of corpus or hand-written rules. Second, it launches a complementary Q&A dialogue to confirm the intention in case of ambiguity. In addition, the generated voice command interface can learn and evolve from user interactions. It accumulates the history command understanding results to annotate the user’s input and improve its semantic understanding ability. We implemented this approach on Android devices and conducted a two-phase user study with 16 and 67 participants in each phase. Experimental results of the study demonstrated the practical feasibility of AutoVCI.
FaceOri: Tracking Head Position and Orientation Using Ultrasonic Ranging on Earphones
(CHI '22) Yuntao Wang¹, Jiexin Ding¹, Ishan Chatterjee, Farshid Salemi Parizi, Yuzhou Zhuang, Yukang Yan*, Shwetak Patel, Yuanchun Shi
Abstract
Face orientation can often indicate users’ intended interaction target. In this paper, we propose FaceOri, a novel face tracking technique based on acoustic ranging using earphones. FaceOri can leverage the speaker on a commodity device to emit an ultrasonic chirp, which is picked up by the set of microphones on the user’s earphone, and then processed to calculate the distance from each microphone to the device. These measurements are used to derive the user’s face orientation and distance with respect to the device. We conduct a ground truth comparison and user study to evaluate FaceOri’s performance. The results show that the system can determine whether the user orients to the device at a 93.5% accuracy within a 1.5 meters range. Furthermore, FaceOri can continuously track user’s head orientation with a median absolute error of 10.9 mm in the distance, 3.7◦ in yaw, and 5.8◦ in pitch. FaceOri can allow for convenient hands-free control of devices and produce more intelligent context-aware interactions.
From 2D to 3D: Facilitating Single-Finger Mid-Air Typing on Virtual Keyboards with Probabilistic Touch Modeling
(IEEE VR WORKSHOPS 2022) Xin Yi, Chen Liang, Haozhan Chen, Jiuxu Song, Chun Yu, Yuanchun Shi
Abstract
Mid-air text entry on virtual keyboards suffers from the lack of tactile feedback, bringing challenges to both tap detection and input prediction. In this poster, we demonstrated the feasibility of efficient single-finger typing in mid-air through probabilistic touch modeling. We first collected users’ typing data on different sizes of virtual keyboards. Based on analyzing the data, we derived an input prediction algorithm that incorporated probabilistic touch detection and elastic probabilistic decoding. In the evaluation study where the participants performed real text entry tasks with this technique,they reached a pick-up single-finger typing speed of 24.0 WPM with 2.8% word-level error rate.
Communications of CCF | 做好人机交互研究
史元春
Abstract
人机交互(Human Computer Interactioan,HCI)研究人与计算机系统之间自然高效信息交换的原理与技术,实现为由多科模态的输入输出软硬件接口所构成的用户终端界面,形成特定的交互模式。如图1所示,接口分为用户输入数据处理的输入接口和机器处理结果反饿的输出接口。人的交互意图在脑中产生,今天的生命科学和脑电技术尚不能实现直接读脑写脑(图1中表示为虚线),交互意图需要通过外周神经系统下的行为动作表达出来,可以是操控工具,也可以是语音和动作的自然表达.输入接口的主要任务是捕捉和处理人的外在行为;机器处理结果的呈现要符合人的感知认知特点。
CAAI Communications|元宇宙需要人机交互的突破
史元春
Abstract
元宇宙目标实现万物的信息化和智能化,创造一个信息充分包围人的虚实融合空间,演化生成时空无界的新型社会形态。人机交互是元宇宙的核心关键技术,人机接口的扩展和虚拟化,实现人机之间高效交换语义信息技术挑战大。掌握人机交互科技优势,对推动相关产业发展有着至关重要的作用。本文分析元宇宙人机交互的挑战,重点探讨交互意图推理的突破思路与最新进展。
Easily‑add battery‑free wireless sensors to everyday objects: system implementation and usability study
(CCF TPCI 2022) Tengxiang Zhang, Zi Qian, Hsuan Wei Fan, Jie Ren, Yuntao Wang, Yuanchun Shi
Abstract
The trend of IoT brings more and more connected smart devices into our daily lives, which can enable a ubiquitous sensing and interaction experience. However, augmenting many everyday objects with sensing abilities is not easy. BitID is an unobtrusive, low-cost, training-free, and easy-to-use technique that enables users to add sensing abilities to everyday objects in a DIY manner. A BitID sensor can be easily made from a UHF RFID tag and deployed on an object so that the tag’s readability (whether the tag is identifed by RFID readers) is mapped to binary states of the object (e.g., whether a door is open or closed). To further validate BitID’s sensing performance, we use a robotic arm to press BitID buttons repetitively and swipe on BitID sliders. The average press recognition F1-score is 98.9% and the swipe recognition F1-score is 96.7%. To evaluate BitID’s usability, we implement a prototype system that supports BitID sensor registration, semantic defnition, status display, and real-time state and event detection. Using the system, users confgured and deployed a BitID sensor with an average time duration of 4.9 min. 23 of the 24 users deployed BitID sensors worked accurately and robustly. In addition to the previously proposed ’short’ BitID sensor, we propose new ’open’ BitID sensors which show similar performance as ’short’ sensors.

2021

The practice of applying AI to benefit visually impaired people in China
(COMMUNICATIONS OF THE ACM 2021) Chun Yu, Jiajun Bu
Abstract
According to the China Disabled Persons'Federation(CDPF),there are now 17 million visually impaired people in China, among which three million are totally blind, while the others are low-visioned. In the past two decades, China has experienced tremendous development of information technology. Traditional industries are incorporating information technology, with services delivered to users through websites and mobile applications. It is positive technical progress that visually impaired people can access various services without leaving home; for example, they can order food delivery online or schedule a taxi from an appbased transportation service.