Table 2 Consistency scores for all models. Consistency reflects the percentage (%) of questions for which a model provided the same answer in at least 8 out of 10 repetitions, from a random subset of 100 questions. Consistently correct and consistently incorrect scores indicate the proportion of these responses that were accurate or erroneous, respectively