Photo by Iñaki del Olmo on Unsplash

Recently I received thoughtful feedback from N Rukkumani to create a catalogue for the blogs written so far. So I’ve decided to do the same for easy access depending upon the readers’ topic of interest.


Photo by Appolinary Kalashnikova on Unsplash

In this post, we’ll discuss two main topics (1) How to extract optimum policy from optimal value (2) Policy Iteration(another variant of DP). Let’s start with extracting optimal policy from the value. In the earlier post, we’ve seen value iteration, where the actions are spun across the Q function to figure out which action produces high-value outcomes.

Once we have the optimal value by figuring out which action for which state results in a maximum reward, the next step will be to map the optimal policy corresponding to the value. We already know that policy is basically an agent’s behaviour…


Photo by Edgar MORAN on Unsplash

For this post, we’ll explore some of the python packages that can be used in geocoding. Geocoding refers to retrieving geolocations (i.e) latitude & longitude by providing addresses. These details will be quite useful when we want to download satellite images of a particular location but we only have the address of the location at hand. In order to download any remote sensing images, one of the mandatory parameters is latitude and longitude.

The reverse process is called reverse geocoding, where latitude and longitude are given as inputs to retrieve the addresses. For instance, we are using an app that…


Photo by Laine Cooper on Unsplash

Dynamic Programming is model-based learning to find the optimal policy based on the known dynamics of the environment (i.e) transition probabilities. We have two variants of DP including (1) Value iteration (2) Policy iteration. In this post, our focus will be on Value iteration where the optimal policy will be based on the maximum value function. The previous articles Day 169 & Day 170 can be referred to for the optimal value function and the Q function (i.e) the Bellman equation.

As we already know, the optimal value function can be found by iterating over the Q function over all…


For this post, let’s discuss some interesting terms around how to compute the distance between two geolocations(lat & lon) and also how to clip a buffer from the map. As we all basically know if we want to compute the distance between two points, we just subtract them. But the same logic cannot be applied to geolocations represented in terms of latitude and longitude. Let’s take two places (lat = 11.4119347, lon = 76.6584018) and (lat = 11.0120145, lon = 76.8271459) and our objective is to find out the distance between these two points in kilometres. …


Photo by Ethan Hoover on Unsplash

In the previous post, we’ve seen the Bellman equation of the value function. Let’s expand the formula to the ‘Q’ function, which includes the state-action value function. The equation for the ‘Q’ function is almost similar to the value function. The first term in the equation represents the reward obtained for the state ‘s’, taking an action ‘a’ and then moving to the next state s’. …


Photo by Antoine Dautry on Unsplash

In one of the previous posts, we’ve discussed the value function. To recap it quickly, the value function represents the sum of rewards obtained from the specific state onwards. Let’s see how the value function can be denoted in terms of the Bellman equation,


Photo by JOSHUA COLEMAN on Unsplash

In this post, we’ll discuss some of the Atari Game environments along with others(Box2D, MuJoCo and Robotics). We have around 2600 Atari gaming environments that are available in the Gym toolkit. As these gaming environments are quite challenging, they will be leveraged to test many RL algorithms for performance. Let’s take a sample Atari environment Assault, every Atari Game has 12 variants(3 groups having 4 types under each group).

The state of the Atari games could be either the image of the screen or the RAM of the Atari machine. In the case of images, the pixels are used which…


When we took the Frozenlake environment, both action and the observation space are discrete. In other scenarios such as car races, the observation space is continuous where the states can be in a range(max, min). One such example is the inverted pendulum swing-up problem. Here both the action and observation space are continuous.

As we see from the below picture the action space is torque (i.e) how much force we apply in order to make the pendulum swing. In other words, torque produces angular acceleration. The observation space(state) is speed. …


Photo by Jason Leung on Unsplash

In the previous post, we’ve seen the basic functions of the gym toolkit. For this blog, we will take the next step of getting to know about what changes happen when we take a particular action and how to generate an episode. We know that one single episode includes the entire cycle (i.e) from the start till the process terminates. For this article, we’ll take the available environment FrozenLake 8x8 with more states compared to its smaller version FrozenLake-v0. Let’s create the environment and use the respective commands to retrieve the details for the environment.

Step1: Setting up the environment…

Nandhini N

AI Enthusiast | Blogger✍

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store