Alexander Van de Kleut

Beyond Vanilla Policy Gradients: Natural Policy Gradients, Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO)

16 minute read

The point of TRPO is to try to find the largest step size possible that can improve the policy, and it does this by adding a constraint on the KL divergence ...

Actor-Critic Methods, Advantage Actor-Critic (A2C) and Generalized Advantage Estimation (GAE)

18 minute read

An actor-critic algorithm is a policy gradient algorithm that uses function estimation in place of empirical returns $G_t$ in the policy gradient update.

Policy Gradient Theorem and REINFORCE

14 minute read

In order to use continuous action spaces and have stochastic policies, we have to model the policy $\pi$ directly. We can parametrize our policy using some p...

Deep Q-Learning with Neural Networks

12 minute read

The original formulation of $Q$-learning requires that both the state and action space be small and discrete. However, we run into problems when the action s...

OpenAI Gym and Q-Learning

8 minute read

In the general reinforcement learning paradigm, the agent only has access to the state, the corresponding reward, and the ability to choose an action.

The Mathematical Foundations of Reinforcement Learning

16 minute read

Every action of a rational agent can be thought of as seeking to maximize some cumulative scalar reward signal.

Neural Style Transfer with PyTorch and torchvision

11 minute read

Neural style transfer is a technique for doing style transfer, where we combine the content of one image with the style of another image.

Variational AutoEncoders (VAE) with PyTorch

10 minute read

Autoencoders are a special kind of neural network used to perform dimensionality reduction. We can think of autoencoders as being composed of two networks, a...

Shutting the Box with Dynamic Programming

11 minute read

Something funny happens when you take computer science. You begin to think of problems like this in terms of algorithms and mathematically optimal strategies.