Most games are consciously designed with a specific experience or vision in mind. This vision can differ significantly between games and genres. Games are commonly designed for entertainment and competition purposes, but self-expression, social critique and knowledge discovery are also valid design objectives. Determining whether an objective is fulfilled is often quite difficult due to the complexity of modern games and the variability of human responses. For this reason, games are commonly play-tested before being published. They are not, however, playtested well or effectively in many cases.
Play-tests are expensive and time-consuming and not every aspect of the game can be evaluated before being published. This is particularly true for games that are meant to be played for long periods of time, with large groups of people. In addition, playtests need to be designed carefully with the intended game experience in mind. However, much of the design and fine-tuning processes of a game rely on intuitive judgement and the experience of the designer. Of course, with exhaustive testing being impossible, adjustments to the game are in some cases scheduled semi-regularly (as patches) depending on observations of how the game is received. If a game does not work at all as intended (i.e. it is considered broken), sometimes patches may be in order to resolve the discovered problems.
These problems speak to the need for the game evaluation task force. Researchers have proposed methods intended to assist game designers using methods from the field of artificial and computational intelligence (AI and CI, respectively). Many of the publications in the area of AI-assisted game design includes an automatic evaluation of a game or specific game content. The information obtained about the game and its content are usually provided to a designer in order to support their design and decision-making process. Additionally, methods in the field of Artificial and Computational Intelligence in Games that involve the automatic generation of content, narrative, or rules for games rely on some form of machinable evaluation of their output. We will further also be drawing on expertise from the field of sensemaking and data visualisation in order to ensure the interpretability of the evaluation approaches.
Clearly there is demonstrable prevalence and necessity of evaluation methods for games. Still, to our knowledge, there is a surprising lack of generality and verification regarding these methods, even in scientific publications on game design. The employed evaluation methods are typically specific to one game. Too often, the methods neither include research on player modelling nor are validated experimentally. This is understandably the case in publications where the evaluation is not the focus of the work, as evaluation methods can usually be exchanged for another. However, we argue that employing arbitrary evaluation methods in research publications can be misleading in terms of the analysis and evaluation of the actual added value and potential applications of e.g. a content generation algorithm. On top of that, employing artificial evaluation methods can seem detached from the expectations of designers and developers in the game industry.
We thus propose to organise efforts towards the development, analysis, and dissemination of gameplay evaluation methods through a task force. These methods are regularly employed, but not efficiently compared and published. No central repository for methods currently exists. In the following, we outline how the new task force will accomplish this in the future..
State-of-the-art game (content) evaluation methods are often based on various assumptions that can be the source of an error in the evaluation.
The new task force seeks to improve the state-of-the-art in game evaluation as much as possible with the eventual goal of reliable automatic evaluation of a game.
In the short-term, the task force will create awareness of the existing lack of scrutiny regarding game evaluation methods, as well as encourage more researchers in academia and industry to participate in this discussion. This will be done via special sessions at conferences, special issues of appropriate journals like the IEEE Transactions on Games, and by organizing workshops. In the long term, the task force will: