Beyond the Coordinate: Reconfiguring Affective Sensation via LLM-Based Keypoint Localization
Xiaobin Liu
■ Abstract
The intersection of art and AI has led to a shared focus on how machines perceive the nuances of human emotion. While emotion recognition depends on the flexible localization of keypoints, such as facial expressions, hand gestures, and body postures, conventional computer vision models are often restricted to predefined keypoint types. These models typically fail to generalize to "unseen" keypoints or subtle movements that were not included in their prior knowledge, leading to a closed emotion perception system that lacks the necessary flexibility for open-world environments. To address this limitation, we propose a new framework that leverages the logical reasoning capabilities of Large Language Models (LLMs) for human keypoint localization, named Large Language-guided Localization Model (L3M). Instead of treating localization as a traditional numerical regression task, we discretize human joints into novel Spatial Coordinate Tokens (SCTs) in LLM's vocabulary. This transformation allows the system to treat physical positions as semantic symbols, enabling the LLM to perform semantic inference rather than simple pixel-matching. By bridging the gap between raw physical kinetics and the LLM's inherent knowledge, the model can successfully localize both standard and previously unseen affective markers without task-specific architectural changes. Experimental results on real-world datasets demonstrate that L3M provides superior generalizability in unconstrained environments. It offers a more adaptive foundation for affective computing, enabling the recognition of diverse and fine-grained emotional cues. This research provides a feasible trajectory for AI to evolve from a passive sensor into an intelligent interpreter of human physical information. By redefining how AI perceives the human body through discrete logic rather than mechanical tracking, we offer a robust technical foundation for future human-machine interaction and real-world behavioral analysis.
■ Bio
Xiaobin Liu received the B.E. degree from Nankai University in 2016, and the Ph.D. degree from Peking University in Jan. 2022. He was a senior researcher with focus on multimodal content understanding in PCG, Tencent from 2022.02 to 2024.08. He is now an assistant professor in Laboratory for Advanced Perception and Control (LAPC), College of AI, Nankai University. His work focuses on embodied AI and multimodal content understanding. He has authored or co-authored 19 papers on important journals or conferences such as TIP, PR, NeurIPS, IJCAI, ACM MM. He was a recipient of the Second Prize of Technical Invention of Tianjin, National Outstanding Doctoral Dissertation Award in Transportation Engineering, the Excellent Presentation Award on ICIVC 2025, the 3rd-place award on the 5th Tianjin "Haihe Talents" postdoctoral innovation and entrepreneurship contest. He serves as a Track Chair on ICIVC 2025, and a reviewer of more than 10 journals or conferences, including TIP, IJCV, CVPR, ICCV, etc. His research is funded by the National Natural Science Foundation of China (NSFC), National Key R&D Program, Postdoctoral Fellowship Program of CPSF, China Postdoctoral Science Foundation, and China Postdoctoral Science Foundation - Tianjin Joint Support Program.