Blog Post Week 1

Aarshdeep Singh N -

Hi everyone, welcome to my Reinforcement Learning Research Project!

During Week 1, I took the Fundamentals of Reinforcement Learning course on Coursera to build a strong foundation in RL concepts. I began by understanding the key differences between supervised learning and reinforcement learning. Unlike supervised learning, where a model is trained with labeled data, reinforcement learning relies on an agent learning through trial and error, receiving rewards or penalties based on its actions. One of the core concepts I explored was policy gradients, which help the model adjust action probabilities over time. Instead of directly assigning correct actions, the policy network refines its choices based on delayed rewards, reinforcing decisions that lead to success while discouraging those that result in failure.

I also learned about gradient updates in reinforcement learning, where actions are chosen based on probabilities, and their effectiveness is evaluated only after seeing the final outcome. This delayed feedback mechanism is what makes RL distinct, as it allows the model to learn strategies over long sequences of decisions rather than immediate corrections. Another crucial concept I studied was the balance between exploration and exploitation. The agent must decide whether to try new actions (exploration) or rely on past successful strategies (exploitation), and finding the right balance is essential for optimizing learning. Additionally, I gained insight into reward-based learning, where each action influences the future state and overall trajectory of the agent, reinforcing the idea that short-term rewards may not always align with long-term success.

This first week gave me a solid theoretical foundation in reinforcement learning, preparing me to dive deeper into practical applications and implementation in the coming weeks.

This upcoming week, I look to continue learning about the theory behind Reinforcement Learning but also dive deeper into the math, ie Bellman’s Equation. I am also looking for open software to start training a AI-Agent and will get updates for that next week!

Thanks for reading.

More Posts

Leave a Reply

Your email address will not be published. Required fields are marked *