The Calibration Game

Calibration game is a game to get better at identifying hallucination in LLMs.

This is a research artifact and intended for research and educational purposes only.

How to Play

In this game, you'll assess your ability to predict the accuracy of a Language Models's (LLM) responses to various questions or prompts. Follow these steps to play:

After reading each question or prompt presented to the LLM, rate your confidence in the LLM's ability to provide a correct response.

- A rating of 0 meaning you are certain the LLM will produce an incorrect response or hallucinate.
- A rating of 1 meaning you are certain the LLM will provide a correct and accurate response.

Calibration Score

At the end of the game, your performance will be evaluated based on a Calibration Score. This score reflects the accuracy of your confidence ratings in relation to the LLM's actual performance.

- A perfectly calibrated response means your confidence ratings precisely match the actual accuracy of the LLM's responses across all prompts.
- The game session includes a total of 20 prompts. A Calibration Score of 0 is the best possible score, indicating your predictions perfectly aligned with the LLM's performance.

Learn more about Calibration.