Policy Gradient Theorem and REINFORCE

14 minute read

In order to use continuous action spaces and have stochastic policies, we have to model the policy $\pi$ directly. We can parametrize our policy using some p...




Deep Q-Learning with Neural Networks

12 minute read

The original formulation of $Q$-learning requires that both the state and action space be small and discrete. However, we run into problems when the action s...




OpenAI Gym and Q-Learning

8 minute read

In the general reinforcement learning paradigm, the agent only has access to the state, the corresponding reward, and the ability to choose an action.










Variational AutoEncoders (VAE) with PyTorch

10 minute read

Autoencoders are a special kind of neural network used to perform dimensionality reduction. We can think of autoencoders as being composed of two networks, a...




Shutting the Box with Dynamic Programming

11 minute read

Something funny happens when you take computer science. You begin to think of problems like this in terms of algorithms and mathematically optimal strategies.