Week 2

Aarshdeep Singh N - February 19, 2025 10:48 pm

Hello everyone, and welcome to my week 2 updates of the senior research project. After spending my first week learning the fundamentals of reinforcement learning, this week I shifted my focus toward finding open-courseware and resources to start training my own RL model. The transition from theory to implementation has been exciting, as I explored various platforms and frameworks that provide environments for reinforcement learning experiments. Understanding how to structure an RL model, define states and actions, and set up reward mechanisms has been a key part of my learning process.

Alongside this, I also delved into the mathematical foundations necessary for working with reinforcement learning algorithms, particularly Bellman’s Equation. This equation is fundamental in RL because it expresses the relationship between the value of a state and the expected rewards from future actions. By breaking down complex decision-making into recursive value functions, Bellman’s Equation helps optimize an agent’s long-term rewards. Understanding the mathematics behind it, including Markov Decision Processes (MDPs), dynamic programming, and value iteration, is crucial for building more efficient RL models.

This week has been all about bridging the gap between theoretical knowledge and practical implementation. As I continue my research, I am excited to begin experimenting with different algorithms and testing reinforcement learning models in various environments. Stay tuned for more updates as I make progress in developing and refining my own RL system!

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

adam_b

Hi Aarsh! Fascinating Project! What environment or algorithm are you most excited to test first?

February 28, 2025 at 7:06 pm - Reply

aarshdeep_singh_n

Hi Adam! Thank you so much for your kind comment. The first environment I will be testing will be a simple cartpole model. The model will try to balance a stick on a moving cart. It's very similar to trying to balance the tip of your pencil on your finger. Although its not the most flashy of first environments, its a very basic one to understand, and would lay the framework to better understand future programs.

March 3, 2025 at 9:49 pm - Reply

Week 2

More Posts

Comments:

Leave a Reply Cancel reply