A Brief History of This Tool: Who Developed It?
The SARSA algorithm—an acronym for State-Action-Reward-State-Action—was introduced by Richard Sutton and Andrew Barto as part of their foundational work in reinforcement learning. First documented in the early 1990s, SARSA has since become a cornerstone algorithm in reinforcement learning for its ability to learn policies in real time, adapting to changing environments.
What Is It?
Imagine teaching a child to ride a bicycle. Each wobble (state), adjustment of the handlebars (action), and the subsequent encouragement (reward) inform the child’s next move. SARSA functions similarly, learning through direct interaction with the environment and updating its decisions based on the sequence of events.
Why Is It Being Used? What Challenges Are Being Addressed?
SARSA is widely used because it addresses these critical challenges:
- Real-Time Learning: SARSA learns during exploration, adapting to environmental changes on the fly.
- Safe Exploration: The algorithm evaluates actions based on their immediate impact, preventing overly risky decisions.
- Policy Control: Unlike other algorithms, SARSA adheres strictly to its policy during both learning and execution.
How Is It Being Used?
SARSA works through the following steps:
- Initialise Values: Start with arbitrary estimates for state-action values (Q-values).
- Choose Action: Select an action using a policy (e.g., ε-greedy).
- Observe Transition: Execute the action, observe the reward, and transition to the next state.
- Update Q-Values: Update the current Q-value using the SARSA formula.
- Iterate Until Convergence: Repeat until the policy stabilises.
Different Types
SARSA has a couple of notable variants:
- On-Policy SARSA: The traditional SARSA, which learns based on the agent’s current policy.
- SARSA(λ): A variant incorporating eligibility traces for better credit assignment across sequences.
Different Features
Key features of SARSA include:
- Policy-Adherent Learning: SARSA ensures actions are evaluated within the agent’s policy, avoiding divergent exploration.
- Exploration-Sensitive Updates: It updates values based on both the current and next actions, aligning learning with exploration.
Different Software and Tools for SARSA
Developers can implement SARSA using the following tools:
- OpenAI Gym: Offers environments for implementing SARSA in various scenarios.
- PyTorch and TensorFlow: Popular frameworks for creating SARSA-based models.
3 Industry Application Examples in Australian Governmental Agencies
- Australian Maritime Safety Authority (AMSA):
- Use Case: Optimising rescue operations by learning efficient search-and-rescue strategies.
- Impact: Reduced average response time by 15%.
- Transport for Victoria (TfV):
- Use Case: Enhancing traffic light coordination for smoother traffic flow.
- Impact: Lowered congestion rates by 20%.
- Department of Home Affairs:
- Use Case: Automating border control checkpoints using SARSA to learn optimal decision-making processes.
- Impact: Increased throughput by 12% without compromising security.