Fengshuo Bai

Ph.D. Student

Department of Computer Science and Engineering
Shanghai Jiao Tong University

My research spans dexterous manipulation, preference-based RL, and AI alignment, aiming to build reliable systems that learn from complex human objectives.

Dexterous Manipulation Preference-based RL AI Alignment

Email: fengshuobai [@] sjtu [DOT] edu [DOT] cn

Biography

I am a Ph.D. student in the Department of Computer Science and Engineering at Shanghai Jiao Tong University and a member of PAIR Lab. I am co-advised by Prof. Yaodong Yang and Prof. Ying Wen, and I am also part of the Wen-Tsun Wu AI Honorary Doctoral Class advised by Prof. Cewu Lu.

My research focuses on dexterous manipulation, preference-based reinforcement learning, and AI alignment, with the goal of building effective learning systems that align with complex human preferences. I am always open to collaboration and discussion on shared interests.

Education

Ph.D. in Computer Science and Engineering

Shanghai Jiao Tong University, 2023 – Present

Advised by Prof. Yaodong Yang & Prof. Ying Wen

M.S. in Control Theory and Control Engineering

University of Chinese Academy of Sciences, 2020 – 2023

B.Eng. in Computer Science and Engineering

Southeast University, 2016 – 2020

Experience

Institute for AI, Peking University

Research Intern, Jan 2022 – Oct 2022

Worked on RLHF with Dr. Yali Du and Dr. Yaodong Yang.

PKU-PsiBot Joint Lab

Research Intern, Aug 2024 – Feb 2026

Focused on Sim2Real transfer and Vision-Language-Action models.

Zhongguancun Academy

Student Project Leader, Sep 2024 – Jun 2027

Leading the AI Chemist project.

Honors & Awards

Innovation & Excellence Scholarship

Zhongguancun Academy, 2025

China National Scholarship

Ministry of Education, 2017

President Scholarship

Southeast University, 2018

Outstanding Student Leader

Jiangsu Province, 2019

Pacemaker to Merit Student

Southeast University, 2018

Competitions

Silver Prize (25 / 1178)

Kaggle Lux AI Challenge, 2021

3rd (Intro) & 7th (Research) Track

NeurIPS MineRL Diamond Competition, 2021

14th Place

IJCAI Mahjong AI Competition, 2021

News

2026.01 PoliCon was accepted to ICLR 2026.

2025.09 ValueDCG was accepted to the NeurIPS 2025 Workshop on Regulatable ML.

2025.09 Two papers were accepted to NeurIPS 2025, with one selected as Spotlight.

2025.08 AdaptFlow was accepted to EMNLP 2025.

2025.07 We released our VLA survey, A Survey on Vision-Language-Action Models.

2025.06 One paper was accepted for oral presentation at AGI 2025.

Selected Publications

* denotes equal contribution. Showing first-, co-first-, second-, and third-author papers by default.

2025 – 2026

STAR: Efficient Preference-based Reinforcement Learning via Dual Regularization

Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Shao Zhang, Bo Xu, Lei Han, Ying Wen, Yaodong Yang

Conference on Neural Information Processing Systems (NeurIPS), 2025

Paper Video

DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation Spotlight

Kefei Zhu, Fengshuo Bai, Yuanhao Xiang, Yishuai Cai, Xinglin Chen, Ruochong Li, Xingtao Wang, Yaodong Yang, Hao Dong, Xiaopeng Fan, Yuanpei Chen

Conference on Neural Information Processing Systems (NeurIPS), 2025

ArXiv Project

ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models

Zhaowei Zhang, Fengshuo Bai, Jun Gao, Yaodong Yang

NeurIPS Workshop on Regulatable ML, 2025

Paper ArXiv

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Yifan Zhong*, Fengshuo Bai*, Shaofei Cai, Xuchuan Huang, Zhang Chen, Xiaowei Zhang, Yuanfei Wang, Shaoyang Guo, Tianrui Guan, Ka Nam Lui, Zhiquan Qi, Yitao Liang, Yuanpei Chen, Yaodong Yang

Preprint, 2025

Paper HF GitHub

Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems Oral

Zhaowei Zhang, Fengshuo Bai, Mingzhi Wang, Haoyang Ye, Chengdong Ma, Yaodong Yang

Artificial General Intelligence Conference (AGI), 2025

Paper

Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand

Fengshuo Bai, Yu Li, Jie Chu, Tawei Chou, Runchuan Zhu, Ying Wen, Yaodong Yang, Yuanpei Chen

Preprint, 2025

Paper Project

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Zhaowei Zhang*, Fengshuo Bai*, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang