Posted on March 17, 2019 by Nitin Prakash

Dynamic programming is a technique for solving a complex problem by breaking it down into a collection of simpler subproblems. Smaller subproblems are solved only once and stored in a data structure like array, map etc. The fundamental idea being to optimize a given recursive problem and improve upon the complexity.

Stating the same as an RL paradigm:

Dymanic programming refers to solving a given problem using a collection
of algorithms that computes optimal policies given a perfect model of the
environment as a finite Markov decision process.

From the above reference, it should be clear that when we talk about DP we are just formulating a structure for the search of an optimal policy. In any paradigm of solving problems, DP requires two conditions to be satisfied:

  • Optimal substructure
  • Ovarlapping subproblems

Fortunately, MDPs satisfy both the properties.