Test Set

A test set is a collection of cases used to verify the reliability of an expert system or artificial intelligence post-training during the inductive learning process.

Note. In machine learning, the test set is also known as the testing dataset. It serves as an ex post verification method and should not be confused with other ex ante prediction evaluation techniques.

How the Test Set Works
Performance of the Decision Algorithm
Learning Curve
The Risk of Peeking

How the Test Set Works

The test set comprises vectors X, representing the state of n environmental variables { x1, x2, ... , xn }.

Each X vector is associated with a decision variable Y, which is the best possible decision according to subject matter experts.

an example of a testing set

Difference between Training Set and Test Set. The test set has the same structure as the training set, but with different data. In the training set, data are input by a supervisor or the machine itself, observing feedback from the external environment. In the test set, however, data are input by subject matter experts.

The n vectors are presented to the machine as input variables.

The machine generates a response R based on its knowledge base and/or decision tree.

Once obtained, the machine's response R is compared with the best decision Y predicted by the experts.

how the test set works

If the machine's decision matches the experts' (Y=R), it passes the test.

Otherwise, it fails.

Note. Before arriving at an overall quality assessment of the prediction, the machine should be subjected to multiple cases from different test sets. The more diverse the test sets, the more effective the evaluation.

Performance of the Decision Algorithm

The test set allows for the measurement of the algorithm's performance.

A quality indicator is the ratio of correct responses (R_C) to the number of tests the machine undergoes (N_T).

measuring the machine's performance

Learning Curve

The machine's learning improves over time.

When the quality indicator Q is low, the training process continues.

The training set is expanded with new examples to further train the machine.

the learning process through feedback

Feedback progressively improves the predictive capacity of the algorithm until satisfactory levels are reached.

The ongoing improvement process forms a downwardly concave curve known as the learning curve.

the learning curve in machine learning

The Risk of Peeking

The test set can influence the machine learning process.

Designers might develop training based on the test set cases to achieve high-quality results in the testing phase immediately.

This risk is known as peeking.

peeking contaminates the machine's training dataset

The machine is enhanced to respond correctly to the anticipated test examples.

However, even with good performance on the test set examples, it doesn't necessarily have a general predictive capacity on the subject matter.

How to Avoid the Risk of Peeking?

The test set should be original and specifically prepared for each testing phase.

This way, the training phase won't be influenced by the test set.