Research

From Scale to Interaction - The LLM Journey

来源: 作者: 发布时间:2024-03-21

contact person: Prof. Jin-Jian Zhou, jjzhou@bit.edu.cn

reporter: Dr. Dong Yan, Head of Reinforcement Learning at BaiChuan Intelligence

time: 2024-03-21

place: Room 229 Physics Experiment Center, Liangxiang Campus

profile:

Abstract:

   Large Language Models (LLM), represented by the GPT series, are profoundly changing the way human society operates. This report attempts to discuss the two training phases of LLMs - Pretrain and Alignment - from the aspects of Scale and Interaction. The pretraining phase starts with the only scalable first principle approach at the current stage of AGI: next token prediction (scaling by predicting the next token), introducing the development of technology. Alignment starts from the perspective of Exploration & Exploitation, introducing how to use Human Feedback to align the model with human preferences.

Profile

Dong Yan graduated with a Ph.D. from the Department of Computer Science at Tsinghua University. He has held positions as a researcher at Intel China, a postdoctoral fellow in the Computer Science Department at Tsinghua University, and the head of the Advanced Decision-Making group in Qi Yuan Laboratory, focusing on machine intelligence. His research primarily involves decision-making algorithms and systems. In terms of algorithms, he proposed a solution framework that connects model-free and model-based reinforcement learning algorithms through a reward distribution mechanism. In terms of systems, he designed the reinforcement learning programming framework "Tian Shou," which has garnered over 6.6k stars and 1k forks on GitHub, with related articles published in JMLR. His awards include runner-up in the 2017 ViZDoom challenge and champion in 2018 (as team leader), champion of Tencent's "Enlightenment" Honor of Kings challenge in 2022/2023 (as a mentoring teacher), and 9th place (out of 306 teams, as team leader) in the 2023 "Tian Xing Cup" intelligent aerial combat beyond visual range category. He is currently the head of Reinforcement Learning at BaiChuan Intelligence.