Calibration game is a game to get better at identifying hallucination in LLMs.
This is a research artifact and intended for research and educational purposes only.
How to Play
In this game, you'll assess your ability to predict the accuracy of a Language Models's (LLM) responses to various questions or prompts. Follow these steps to play:
After reading each question or prompt presented to the LLM, rate your confidence in the LLM's ability to provide a correct response.
- A rating of 0 meaning you are certain the LLM will produce an incorrect response or hallucinate. - A rating of 1 meaning you are certain the LLM will provide a correct and accurate response.
Calibration Score
At the end of the game, your performance will be evaluated based on a Calibration Score. This score reflects the accuracy of your confidence ratings in relation to the LLM's actual performance.
- A perfectly calibrated response means your confidence ratings precisely match the actual accuracy of the LLM's responses across all prompts. - The game session includes a total of 20 prompts. A Calibration Score of 0 is the best possible score, indicating your predictions perfectly aligned with the LLM's performance.