How to use Reinforcement Learning for Portfolio Optimization

Oct 30, 2023

∙ Paid

What problem are we trying to solve?
Framing the problem as a Machine Learning solution
The model
The loss function
The training process
Improvements

What problem are we trying to solve?

Deep Learning really helped simplify the implementation of Reinforcement Learning problems. Let’s consider the portfolio optimization problem as an example. I am going to simplify the problem as much as I can to focus on the optimization aspect and how Reinforcement Learning can help with that.

Let’s consider the Nasdaq Stock Market, for example. There are about 3,300 listed companies in that market. Let’s say we want to build a stock portfolio from the stocks available in that market. In the beginning, we have an initial cash amount (e.g., $1M), and we want to allocate a percentage of that money into different stocks.

Each day, we may want to reallocate the money to update our portfolio.

Each day, the price of the individual stocks fluctuates, and the value of the portfolio evolves over time. Typically, we want the value of the portfolio to increase over time.

We also want to minimize the risk in building that portfolio. “Risk“ is usually defined as the fluctuations over time of the portfolio, and the time variance is a good measure of risk. For now, I am going to ignore risk and focus on maximizing the value of the portfolio over time. This is a bit unrealistic, but it would not be difficult to adjust the optimization problem to factor in the risk component.

The value of a portfolio is just the sum of all the money allocated in the different stocks of the portfolio. For example, if I have $10 in cash and $20 in Google stocks, my portfolio value is $10 + $20 = $30. If V₀ is the initial value of the portfolio and V_T its value after a time T, we want to maximize the following quantity:

$\rho=\frac{V_T}{V_0}$

If initially, we have $1M in cash, then we are going to allocate a percentage of that money to each of the stocks available in the market. If we don’t own some specific stock, we just say the percentage is 0%. Let’s call w_s,tthe percentage allocated to a specific stock S at time t. The value of that stock in our portfolio at time t is

$v_{s,t} = V_tw_{s,t}$

where V_t is the value of our portfolio at time t.

Every day, the prices of the stocks fluctuate and if, on average, the prices of the stocks we own increase, we make money, otherwise, we lose some.

Let’s p_s,t-1 be the price of the stock S at time t-1 and p_s,t its price at time t. The value of the stock in our portfolio at time t-1 is v_s,t-1, therefore the value of that stock at time t is

$v_{s,t} = \frac{p_{s,t}}{p_{s,t-1}}v_{s,t-1}$

Its value is adjusted by the rate of change of the stock price. Let’s define the rate of change vector for the whole stock market (let’s assume we have M stocks in that market) as:

$\mathbf{y_t}=\left(\frac{p_{1, t}}{p_{1, t-1}}, \frac{p_{2, t}}{p_{2, t-1}}, \ldots, \frac{p_{M, t}}{p_{M, t-1}}\right)$

Let’s also define the percentage vector of stock allocation in our portfolio at time t:

$\mathbf{w_t} = \left(w_{1,t}, w_{2,t}, \ldots,w_{M,t}\right)$

Because the different components of that vector are percentages (or probabilities), we have:

$\sum_{s=1}^M w_{s,t}=1$

Let’s say we allocate the portfolio with a vector w_t-1 at t-1. The rate of change for our portfolio value is y_t . w_t-1, and the value of our portfolio after one time period is simply:

$V_t=V_{t-1}\mathbf{y}_t\cdot\mathbf{w}_{t-1}$

Because we also have

$V_{t-1}=V_{t-2}\mathbf{y}_{t-1}\cdot\mathbf{w}_{t-2}$

Then we have

$V_t=V_{t-2}\mathbf{y}_{t-1}\cdot\mathbf{w}_{t-2}\mathbf{y}_t\cdot\mathbf{w}_{t-1}$

Therefore, by applying this identity iteratively, we can express the value of the portfolio at time T as a function of the portfolio value at time 0:

$V_T=V_{0}\prod_{t=1}^T\mathbf{y}_{t}\cdot\mathbf{w}_{t-1}$

So our optimization problem becomes maximizing the following quantify:

$\rho=\frac{V_T}{V_0}=\prod_{t=1}^T\mathbf{y}_{t}\cdot\mathbf{w}_{t-1}$

The only action we can take at every time step is to reallocate the money among the different stocks. So the problem becomes:

How can we reallocate the money every day so that we can maximize the portfolio returns? How do we choose w_t at every timestep to optimize our portfolio?

Framing the problem as a Machine Learning solution

Keep reading with a 7-day free trial

Subscribe to The AiEdge Newsletter to keep reading this post and get 7 days of free access to the full post archives.