SARSA
25 December

A Very Short Introduction of SARSA Algorithm

The SARSA algorithm, introduced by Richard Sutton and Andrew Barto in the early 1990s, is an on-policy reinforcement learning method that learns policies in real-time by evaluating state-action transitions. Its safe exploration and adaptability make it ideal for dynamic and complex environments, such as traffic systems and rescue operations.

Q Learning
25 December

A Very Short Introduction of Q-Learning

Q-Learning, introduced in 1989 by Chris Watkins, is a model-free reinforcement learning algorithm that discovers optimal decision-making strategies by evaluating actions in a given state. It is widely applied for scalable problem-solving, from fraud detection to energy grid optimisation and public transport scheduling.

Policy Iteration
25 December

A Very Short Introduction of Policy Iteration

Policy iteration, first introduced in the 1950s by Richard Bellman and refined by Andrew Barto and Richard Sutton, is a fundamental method in Reinforcement Learning for optimising decision-making strategies. By iteratively evaluating and improving policies, it ensures efficient and adaptive solutions for complex sequential decision problems.

MRF
25 December

A Very Short Introduction of Markov Random Fields (MRF)

Markov Random Fields (MRFs), introduced through Andrey Markov’s early 20th-century work and formalised by Julian Besag in the 1970s, are probabilistic graphical models for representing contextual dependencies. Widely used in applications like image processing, natural language processing, and environmental modeling, MRFs capture relationships within structured data.

25 December

A Very Short Introduction of Deep Belief Networks (DBNs)

Deep Belief Networks (DBNs), introduced in 2006 by Geoffrey Hinton and colleagues, revolutionised unsupervised learning by enabling hierarchical feature extraction and robust data representation. Widely used in industries like healthcare, finance, and transport, DBNs enhance tasks such as image recognition, NLP, and time-series prediction.