How LLMs Work: Reinforcement Studying, RLHF, DeepSeek R1, OpenAI o1, AlphaGo

Welcome to half 2 of my LLM deep dive. If you happen to’ve not learn Half…