State Value Function Law of Iterative Expectation
State Value Function and the Law of Iterative Expectation in Reinforcement Learning
In reinforcement learning, the state value function represents the expected return when starting in state
and following a certain policy. The process of deriving the state value function from the Bellman equation involves the law of iterative expectation.
Let’s break down this process into detailed steps.
1. The Objective: State Value Function
We want to calculate the value of a state , which is defined as the expected return starting from that state:
Here, is the total return from time step
, which includes all future rewards. It can be recursively defined as:
Where:
is the immediate reward at time
,
is the discount factor,
is the return from the next time step onwards.
Thus, the state value function can be rewritten as:
2. Law of Iterative Expectation: Breaking Down the Expectation
We can now break the expectation into two parts: the immediate reward and the future return, using the law of iterative expectation. First, the expectation of the immediate reward conditioned on the current state is straightforward. The future return
, however, depends on the next state
. So we apply the law of iterative expectation:
This equation expresses that the total expected return is the sum of the immediate reward and the discounted future returns. The law of iterative expectation allows us to condition on and take the expectation over that.
3. Replacing
with )
Now, is the future total return from the next state
. By definition, the expected return from
is the value function of
, which is
. Thus, we replace the inner expectation term
with
:
This is the application of the law of iterative expectation: first, we condition on the next state and calculate the expected value of
, then we take the expectation over the transition to
.
4. Final State Value Function (Bellman Equation)
Finally, the state value function becomes:
Where:
is the probability of transitioning to state
from
given action
,
is the expected value of the next state
.
This is the Bellman equation, which expresses the value of state in terms of the immediate reward and the expected value of the next state.
Summary:
In summary, the law of iterative expectation is used in the derivation of the state value function to handle the complexity of future returns. Specifically, it allows us to compute the total expected return by first conditioning on the next state , and then taking the expectation over possible future states, resulting in the recursive form of the value function.
Leave a comment