Skip to main content

Table 1 Performance (percentage answered correctly), cost (as a proxy for energy use) and efficiency scores for the various LLMs on all MCQs. Efficiency is the ratio of performance (accuracy) to cost, reflecting the balance between performance and resource usage

From: Comparative evaluation and performance of large language models on expert level critical care questions: a benchmark study

Model

Performance (%)

Cost

Efficiency score

GPT-4o

93.3

€ 3.60

25.9

GPT-4o-mini

83.0

€ 0.14

592.9

GPT-3.5-turbo

72.7

€ 0.96

75.7

Llama 3.1 70B

87.5

€ 1.80

48.6

Mistral Large 2407

87.9

€ 2.70

32.6