Skip to content

LLM

Why Reinforcement Learning (RL) is hot again?

Just finished listening to an incredible podcast featuring an interview with Wu Yi — a Tsinghua alum and former OpenAI researcher — and his take on Reinforcement Learning (RL) was one of the clearest I’ve seen!

🔍 1. What is RL really about?

Wu Yi explains that RL is very different from traditional supervised learning (like image classification). In supervised learning, we train models using a fixed set of labeled data — one-shot answers.

RL, on the other hand, is more like playing a game: you need to make a sequence of decisions (serve, move, react), and there's no single “correct” path. The quality of your decisions is judged by the final outcome (win or lose). It’s about multi-step decision-making — much closer to how the real world works.

🤖 2. Why is RL hot again? What’s its connection to LLMs?