Reinforcement Learning

Reinforcement learning (RL) is a machine learning technique where an agent acquires knowledge through a reinforcement function. It's also known as reinforcement-based or reinforcement-type learning.

How Reinforcement Learning Works
Pros and Cons of Reinforcement Learning

How Reinforcement Learning Works

In RL machine learning, the agent receives an objective to achieve.

Initially, the agent knows the objective but doesn't know how to achieve it, as it lacks a dataset of examples for training, or any prior knowledge base.

In reinforcement learning, the agent must learn from experience and build its own Knowledge Base (KB).

Note. This is a deliberate simplification to better explain RL. In reality, having some initial prior knowledge in RL is always useful, as it helps avoid irreversible mistakes during the experiential learning process.

How does it learn from experience?

Firstly, the agent observes its surrounding environment and converts this into a feature vector X.

Each combination of the vector's elements represents a different state of the environment.

Example. The agent must decide whether to take an umbrella. The operational environment can be defined by three binary characteristics: x₁=raining, x₂=cloudy, x₃=windy. Each characteristic is simply a binary value (1=yes, 0=no). $$ X = \{ x_1, x_2, x_3 \} $$ For example, the feature vector can represent the following environmental states $$ X_1 = \{ 1, 1, 0 \} \\ X_2 = \{ 1, 1, 1 \} \\ X_3 = \{ 0, 1, 0 \} \\ \vdots $$

When the agent makes a decision, it analyzes the environmental state changes by evaluating feedback through a reinforcement function.

feedback analysis

What is the Reinforcement Function?

The reinforcement function measures the success level of an action or decision relative to a predetermined goal.

Reward. If the feedback is positive, the agent has moved closer to the goal after the action, and the reinforcement function assigns a reward to the machine. The reward is a positive real value.
Penalty. If the feedback is negative, the agent has moved away from the goal after the action, and the reinforcement function assigns a penalty to the machine. The penalty is a negative real value.

Note. I've simplified the reinforcement function schema for easier understanding. In reality, rewards and penalties are not assigned after a single action but rather after a sequence of actions. Otherwise, the machine would never be able to take a step back to make two steps forward.

As it gains experience, the machine collects valuable feedback information on actions and records this in the Knowledge Base (KB).

reinforcement machine learning

The agent's goal is to maximize the reinforcement function.

In the KB, it associates actions with the highest rewards to each environmental state X_i.

Example. If the goal is to stay dry and the state vector indicates a situation with rain, clouds, and wind $$ X_i=(1,1,1) $$ the decision to bring an umbrella will certainly receive a higher reward compared to the decision to leave without one.

This allows the agent to repeat the most profitable actions over time and avoid loss-making ones in every environmental state X_i.

In practice, the agent learns to win by playing the game.

Note. The data collected in the KB are similar to a labeled dataset (a training set) and are useful for generalizing the decision-making statistical model to those states about which the machine still lacks sufficient information, similar to supervised machine learning. For instance, the machine knows it's useful to carry an umbrella when it's raining, cloudy, and windy X=(1,1,1). However, it lacks information about the state where it's raining, cloudy, but not windy X=(1,1,0). In this case, the machine may still decide to bring the umbrella by proximity.

Pros and Cons of Reinforcement Learning

Reinforcement learning (RL) combines the advantages of supervised and unsupervised learning.

Unsupervised Learning. Like unsupervised learning, in RL the machine isn't tied to a table of examples with inputs and outputs written by a designer. Thus, it's less bound by the content of the training set and can make decisions with fewer constraints and greater freedom. However, unlike unsupervised learning, the agent doesn't start the learning process without prior knowledge. In reinforcement learning, the machine can distinguish between positive and negative actions right from the start through a reinforcement function.
Supervised Learning. Like supervised learning, in RL the agent is assisted in the learning process. However, the feedback isn't labels added by a supervisor in the training set examples but a mathematical reinforcement function. Therefore, unlike supervised learning, in RL the machine is capable of evaluating even situations not initially anticipated by the designer.

And so on