Zechen Bai 白泽琛

Ph.D. Candidate

Show Lab, National University of Singapore

           


At Kerry Park, Seattle, USA

Biography

I'm a PhD candidate at Show Lab, National University of Singapore, luckily advised by Prof. Mike Shou.

Previously, I spent wonderful time at AWS AI Labs, ByteDance Intelligent Creation Lab, and Baidu VIS.

My research pursues video-centric multimodal intelligence. The ultimate goal is to grow intelligent agents with large-scale videos.

I also enjoyed the research experience on virtual reality and human-computer interaction in my early years.

I'm open to collaborations and discussions. Feel free to contact me via email!

News

Selected Publications (*co-first author)

                                       
Impossible Videos.
Zechen Bai*, Hai Ci*, Mike Zheng Shou
ICML, 2025.

[Paper] [Homepage] [GitHub] [HuggingFace]

Factorized Visual Tokenization and Generation.
Zechen Bai, Jianxiong Gao, Ziteng Gao, Pichao Wang, Zheng Zhang, Tong He, Mike Zheng Shou
arXiv preprint, 2024.

[Paper] [Code]

Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach.
Zechen Bai, Tianjun Xiao, Tong He, Pichao Wang, Zheng Zhang, Thomas Brox, Mike Zheng Shou
ICLR, 2025.

[Paper]

Show-o: One Single Transformer To Unify Multimodal Understanding and Generation.
Jinheng Xie*, Weijia Mao*, Zechen Bai*, David Junhao Zhang*, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou
ICLR, 2025.

[Paper] [ProjectPage] [Code]

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos.
Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, Lei Liu, Zheng Zhang, Mike Zheng Shou
NeurIPS, 2024.

[Paper] [Code]

Hallucination of Multimodal Large Language Models: A Survey.
Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou
arXiv preprint, 2024.

[Paper] [ProjectPage]

Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models.
Zongbo Han, Zechen Bai, Haiyang Mei, Qianli Xu, Changqing Zhang, Mike Zheng Shou
ICLR R2-FM Workshop, 2024.

[Paper]

AssistGUI: Task-Oriented Desktop Graphical User Interface Automation.
Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

[Paper] [Code] [ProjectPage]

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number.
Ke Fan, Zechen Bai, Tianjun Xiao, Tong He, Max Horn, Yanwei Fu, Francesco Locatello, Zheng Zhang.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

[ProjectPage]

Unsupervised Open-Vocabulary Object Localization in Videos.
Ke Fan*, Zechen Bai*, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He
IEEE International Conference on Computer Vision (ICCV), 2023.
* Equal contribution. Ke is the first intern author, Zechen is the first FTE author.

[Paper] [Code]

Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation.
Zechen Bai, Yuta Nakashima, and Noa Garcia.
IEEE International Conference on Computer Vision (ICCV), 2021.

[Paper][Code][ProjectPage]

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification.
Zechen Bai, Zhigang Wang, Jian Wang, Di Hu, Errui Ding.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. (Oral)

[Paper][Code]

Show, Recall, and Tell: Image Captioning with Recall Mechanism.
Li Wang*, Zechen Bai*, Yonghua Zhang, Hongtao Lu.
AAAI Conference on Artificial Intelligence (AAAI), 2020.

[Paper]

Virtual Reality

            
Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters.
Zechen Bai, Peng Chen, Xiaolan Peng, Lu Liu, Hui Chen, Mike Zheng Shou, Feng Tian.
IEEE Virtual Reality Conference (VR), 2024.

[Paper] [Code]

A Simple Approach to Animating Virtual Characters by Facial Expressions Reenactment.
Zechen Bai, Naiming Yao, Lu Liu, Hui Chen, Hongan Wang.
IEEE Virtual Reality Conference (VR), 2023.

[Paper]

Enhancing Emotional Experience by Building Emotional Virtual Characters in VR Volleyball Games.
Zechen Bai, Naiming Yao, Nidhi Mishra, Hui Chen, Hongan Wang, Nadia Magnenat Thalmann.
International Conference on Computer Animation and Social Agents (CASA), 2021.

[Paper]

Play with Emotional Characters: Improving User Emotional Experience by A Data-driven Approach in VR Volleyball Games.
Zechen Bai, Naiming Yao, Nidhi Mishra, Hui Chen, Hongan Wang, Nadia Magnenat Thalmann.
IEEE Virtual Reality Conference (VR), 2021.
(Best Poster Award!)

[Paper]

Academic Service

  • I serve as reviewers for CVPR, ICCV, NeurlPS, ICLR, ICML, ACM MM conferences and ACM Computing Survey, TCSVT, Neurocomputing, etc.
  • Research Experience

  • Applied Scientist
    February 2022 - August 2023
    Amazon Shanghai Lab, Shanghai, China
    Advisor: Tianjun Xiao, Tong He, and Zheng Zhang

  • Research Intern
    April 2021 - September 2021
    Intelligent Creation Lab, Bytedance, Beijing, China
    Advisor: Panpan Xu and Qian He

  • Visiting Student (remote)
    September 2020 - March 2021
    ISLab, Osaka University, Osaka, Japan
    Advisor: Prof. Noa Garcia and Prof. Yuta Nakashima

  • Research Intern
    February 2020 - September 2020
    Department of Computer Vision Technology (VIS), Baidu, Beijing, China
    Advisor: Zhigang Wang and Jian Wang

  • Visiting Student
    November 2019 - February 2020
    Institute for Media Innovation, Nanyang Technological University, Singapore
    Advisor: Prof. Nadia Magnenat Thalmann

  • Selected Awards

    NeurIPS Scholar Award, 2024
    China National Scholarship, 2021
    Best Poster Award at IEEE-VR 2021
    Beijing Distinguished Graduate Award, 2019


    © Zechen Bai | Last updated: Sep. 2025