Dissecting “Reinforcement Studying” by Richard S. Sutton with Customized Python Implementations, Episode III
We proceed our deep dive into Sutton’s nice ebook about RL [1] and right here deal with Monte Carlo (MC) strategies. These are capable of study from expertise alone, i.e. don’t require any form of mannequin of the atmosphere, as e.g. required by the Dynamic programming (DP) strategies we launched within the earlier put up.
That is extraordinarily tempting — as usually the mannequin just isn’t recognized, or it’s arduous to mannequin the transition possibilities. Contemplate the sport of Blackjack: regardless that we totally perceive the sport and the principles, fixing it through DP strategies can be very tedious — we must compute all types of possibilities, e.g. given the presently performed playing cards, how doubtless is a “blackjack”, how doubtless is it that one other seven is dealt … By way of MC strategies, we don’t should take care of any of this, and easily play and study from expertise.
On account of not utilizing a mannequin, MC strategies are unbiased. They’re conceptually easy and straightforward to know, however exhibit a excessive variance and can’t be solved in iterative style (bootstrapping).
As talked about, right here we are going to introduce these strategies following Chapter 5 of Sutton’s ebook…