论文成果 / Publications
2023
From Gap to Synergy: Enhancing Contextual Understanding through Human-Machine Collaboration in Personalized Systems
Abstract
This paper presents LangAware, a collaborative approach for constructing personalized context for context-aware applications. The need for personalization arises due to significant variations in context between individuals based on scenarios, devices, and preferences. However, there is often a notable gap between humans and machines in the understanding of how contexts are constructed, as observed in trigger-action programming studies such as IFTTT. LangAware enables end-users to participate in establishing contextual rules in-situ using natural language. The system leverages large language models (LLMs) to semantically connect low-level sensor detectors to high-level contexts and provide understandable natural language feedback for effective user involvement. We conducted a user study with 16 participants in real-life settings, which revealed an average success rate of 87.50% for defining contextual rules in a variety of 12 campus scenarios, typically accomplished within just two modifications. Furthermore, users reported a better understanding of the machine’s capabilities by interacting with LangAware.
Interaction Proxy Manager: Semantic Model Generation and Run-time Support for Reconstructing Ubiquitous User Interfaces of Mobile Services
Abstract
Emerging terminals, such as smartwatches, true wireless earphones, in-vehicle computers, etc., are complementing our portals to ubiquitous information services. However, the current ecology of information services, encapsulated into millions of mobile apps, is largely restricted to smartphones; accommodating them to new devices requires tremendous and almost unbearable engineering efforts. Interaction Proxy, firstly proposed as an accessible technique, is a potential solution to mitigate this problem. Rather than re-building an entire application, Interaction Proxy constructs an alternative user interface that intercepts and translates interaction events and states between users and the original app’s interface. However, in such a system, one key challenge is how to robustly and efficiently “communicate” with the original interface given the instability and dynamicity of mobile apps (e.g., dynamic application status and unstable layout). To handle this, we first define UI-Independent Application Description (UIAD), a reverse-engineered semantic model of mobile services, and then propose Interaction Proxy Manager (IPManager), which is responsible for synchronizing and managing the original apps’ interface, and providing a concise programming interface that exposes information and method entries of the concerned mobile services. In this way, developers can build alternative interfaces without dealing with the complexity of communicating with the original app’s interfaces. In this paper, we elaborate the design and implementation of our IPManager, and demonstrate its effectiveness by developing three typical proxies, mobile-smartwatch, mobile-vehicle and mobile-voice. We conclude by discussing the value of our approach to promote ubiquitous computing, as well as its limitations.
MMPD: Multi-Domain Mobile Video Physiology Dataset
Abstract
Remote photoplethysmography (rPPG) is an attractive method for noninvasive, convenient and concomitant measurement of physiological vital signals. Public benchmark datasets have served a valuable role in the development of this technology and improvements in accuracy over recent years. However, there remain gaps in the public datasets. First, despite the ubiquity of cameras on mobile devices, there are few datasets recorded specifically with mobile phone cameras. Second, most datasets are relatively small and therefore are limited in diversity, both in appearance (e.g., skin tone), behaviors (e.g., motion) and environment (e.g., lighting conditions). In an effort to help the field advance, we present the Multidomain Mobile Video Physiology Dataset (MMPD), comprising 11 hours of recordings from mobile phones of 33 subjects. The dataset is designed to capture videos with greater representation
across skin tone, body motion, and lighting conditions. MMPD is comprehensive with eight descriptive labels and can be used in conjunction with the rPPG-toolbox. The reliability of the dataset is verified by mainstream unsupervised methods and neural methods. The GitHub repository of our dataset: https:
//github.com/THU-CS-PI/MMPD_rPPG_dataset.
Understanding In-Situ Programming for Smart Home Automation
Abstract
Programming a smart home is an iterative process in which users configure and test the automation during the in-situ experience with IoT space. However, current end-user programming mechanisms are primarily preset configurations on GUI and fail to leverage in-situ behaviors and context. This paper proposed in-situ programming (ISP) as a novel programming paradigm for AIoT automation that extensively leverages users’ natural in-situ interaction with the smart environment. We built a Wizard-of-Oz system and conducted a user-enactment study to explore users’ behavior models in this paradigm. We identified a dynamic programming flow in which participants iteratively configure and confirm through query, control, edit, and test. We especially identified a novel method “snapshot” for automation configuration and a novel method “simulation”for automation testing, in which participants leverage ambient responses and in-situ interaction. Based on our findings, we proposed design spaces on dynamic programming flow, coherency and clarity of interface, and state and scene management to build an ideal in-situ programming experience.
Communications of CCF | 从普适计算到人机境融合计算
Abstract
20 世纪 90 年代初,马克·维瑟(Mark Weiser)提出未来的普适 / 泛在计算(Ubiquitous Computing,同义术语还有 Pervasive Computing)将使计算脱离桌面,在掌上人机融合(移动)计算的基础上,建立起我们今天称之为人机境三元融合的交互环境 :更多的计算和感知能力将遁形于物理世界,构成持续服务于用户的分布式系统。近年来,信息技术不断朝此趋势发展,移动计算、云计算、物联网、大数据和人工智能等技术支撑越来越多不断成型的普适计算应用场景,如智慧城市、智能交通、工业物联网、智能家居等,人机境三元融合的计算平台将成为人类社会经济活动不可或缺的基础设施。普适计算场景是由人机境异构资源之间的数据融合关系构成的,其应用开发面临任务、资源、组合关系、编程需要专业领域知识等多方面的挑战,并且需要体现以人为中心的计算服务能力,而人又是三元中难以定义、只能适应的一方,更为复杂,这些复杂性需要靠操作系统来管理和屏蔽。
Squeez’In: Private Authentication on Smartphones based on Squeezing Gestures
Abstract
In this paper, we proposed Squeez’In, a technique on smartphonesthat enabled private authentication by holding and squeezing thephone with a unique pattern. We first explored the design space of practical squeezing gestures for authentication by analyzing the participants’ self-designed gestures and squeezing behavior. Results showed that varying-length gestures with two levels of touch pressure and duration were the most natural and unambiguous. We then implemented Squeez’In on an off-the-shelf capacitive sensing smartphone, and employed an SVM-GBDT model for recognizing gestures and user-specific behavioral patterns, achieving 99.3% accuracy and 0.93 F1-score when tested on 21 users. A following 14-day study validated the memorability and long-term stability of Squeez’In. During usability evaluation, compared with gesture and pin code, Squeez’In achieved significantly faster authentication speed and higher user preference in terms of privacy and security.
SmartRecorder: An IMU-based Video Tutorial Creation by Demonstration System for Smartphone Interaction Tasks
Abstract
This work focuses on an active topic in the HCI community, namely tutorial creation by demonstration. We present a novel tool named SmartRecorder that facilitates people, without video editing skills, creating video tutorials for smartphone interaction tasks. As automatic interaction trace extraction is a key component to tutorial generation, we seek to tackle the challenges of automatically extracting user interaction traces on smartphones from screencasts. Uniquely, with respect to prior research in this field, we combine computer vision techniques with IMU-based sensing algorithms, and the technical evaluation results show the importance of smartphone IMU data in improving system performance. With the extracted key information of each step, SmartRecorder generates instructional content initially and provides tutorial creators with a tutorial refinement editor designed based on a high recall (99.38%) of key steps to revise the initial instructional content. Finally, SmartRecorder generates video tutorials based on refined instructional content. The
results of the user study demonstrate that SmartRecorder allows non-experts to create smartphone usage video tutorials with less time and higher satisfaction from recipients.
A Human-Computer Collaborative Editing Tool for Conceptual Diagrams
Abstract
Editing (e.g., editing conceptual diagrams) is a typical ofce task that requires numerous tedious GUI operations, resulting in poor interaction efciency and user experience, especially on mobile devices. In this paper, we present a new type of human-computer collaborative editing tool (CET) that enables accurate and efcient editing with little interaction efort. CET divides the task into two parts, and the human and the computer focus on their respective specialties: the human describes high-level editing goals with multimodal commands, while the computer calculates, recommends, and performs detailed operations. We conducted a formative study(N = 16) to determine the concrete task division and implemented the tool on Android devices for the specifc tasks of editing concept diagrams. The user study (N = 24 + 20) showed that it increased
diagram editing speed by 32.75% compared with existing state-of the-art commercial tools and led to better editing results and user experience.
ResType: Invisible and Adaptive Tablet Keyboard Leveraging Resting Fingers
Abstract
Text entry on tablet touchscreens is a basic need nowadays. Tabletkeyboards require visual attention for users to locate keys, thus not supporting efcient touch typing. They also take up a large proportion of screen space, which afects the access to information. To solve these problems, we propose ResType, an adaptive and invisible keyboard on three-state touch surfaces (e.g. tablets with unintentional touch prevention). ResType allows users to rest their hands on it and automatically adapts the keyboard to the resting fngers. Thus, users do not need visual attention to locate keys, which supports touch typing. We quantitatively explored users’ resting fnger patterns on ResType, based on which we proposed an augmented Bayesian decoding algorithm for ResType, with 96.3% top-1 and 99.0% top-3 accuracies. After a 5-day evaluation, ResType achieved 41.26 WPM, outperforming normal tablet keyboards by 13.5% and reaching 86.7% of physical keyboards. It solves the occlusion problem while maintaining comparable typing speed with
current methods on visible tablet keyboards.