AI Model Trust Score Leaderboard

32 models ranked by composite Trust Score across 2,637 real-world evaluations. Click any column to sort. Click a model name for its full profile.

Filter:
Rank Model Provider Trust Score RC FA SC RF ST ED HL Evals
1 Gemini 2.5 Pro Google Gemini 8.96 8.82 8.78 9.31 9.47 9.41 7.94 8.96 16 *
2 GPT-5 OpenAI 8.83 8.76 8.82 9.09 9.44 8.93 7.89 8.86 60
3 GPT-5 Mini OpenAI 8.80 8.52 8.92 9.11 9.31 9.23 7.57 8.76 26 *
4 GPT-4.1 Nano OpenAI 8.74 8.71 7.67 9.10 9.45 9.07 7.69 8.82 21 *
5 GPT-5.2 (Thinking) OpenAI 8.71 8.55 8.39 8.99 9.50 8.90 7.35 8.80 185
5 GPT-5.2 OpenAI 8.71 8.50 8.54 8.96 9.48 9.02 7.45 8.74 62
7 Gemini 2.5 Flash Google Gemini 8.67 8.68 7.78 9.06 9.45 9.28 7.16 8.86 41
8 Gemini 3 Flash Google Gemini 8.64 8.49 8.16 9.06 9.21 9.03 7.44 8.87 51
8 GPT-4.1 OpenAI 8.64 8.56 8.34 8.85 8.90 8.96 7.61 8.84 50
10 GPT-5.1 OpenAI 8.52 8.39 8.43 8.65 9.13 8.52 8.04 8.53 23 *
11 Claude Sonnet 4.5 Anthropic 8.40 8.16 7.78 8.69 9.20 8.60 7.17 8.55 651
12 GPT-5.1 (Thinking) OpenAI 8.37 8.40 8.05 8.74 9.10 8.67 7.01 8.51 79
13 GPT-4o OpenAI 8.29 8.31 7.26 8.89 8.46 8.69 7.05 8.49 27 *
14 Gemini 3 Pro Google Gemini 8.26 8.13 7.64 8.57 9.03 8.53 6.93 8.40 479
15 Jamba Large AI21 8.20 8.05 6.10 8.55 8.82 8.64 7.45 8.14 11 *
16 Claude Opus 4.6 (Adaptive) Anthropic 8.16 7.71 7.57 8.38 8.95 8.34 5.47 8.33 117
17 Grok 4 (Reasoning) xAI (Grok) 8.08 8.10 7.51 8.47 8.83 8.42 7.34 8.23 142
18 Grok 4 xAI (Grok) 8.02 7.86 7.72 8.36 8.82 8.49 7.49 8.09 39
19 Grok 3 Mini xAI (Grok) 7.76 7.73 7.50 7.95 8.29 8.92 7.15 7.61 14 *
20 GPT-5 (Generic) OpenAI 7.72 7.87 7.21 8.17 8.37 7.87 6.37 8.00 41
21 Sonar Pro Perplexity 7.70 7.77 7.01 8.13 8.37 8.04 6.69 7.84 67
22 Grok 4.1 (Reasoning) xAI (Grok) 7.68 7.68 6.96 8.04 8.38 8.05 6.61 7.88 99
23 Mistral Small 3.2 Mistral 7.61 7.40 6.54 7.91 8.17 8.05 6.62 7.50 21 *
24 Claude 3.7 Sonnet Anthropic 7.60 7.70 7.02 8.15 8.19 7.72 6.62 7.73 38
24 Claude Sonnet 4.5 (Thinking) Anthropic 7.60 7.40 6.85 7.90 8.33 7.80 6.33 7.58 35
26 Sonar Perplexity 7.18 7.18 3.94 7.58 7.63 7.68 6.05 7.26 19 *
27 Jamba Mini AI21 7.12 7.00 5.00 7.38 7.69 7.47 6.19 6.99 16 *
28 Claude Sonnet 4 Anthropic 7.05 7.03 6.61 7.25 7.40 7.10 6.77 7.08 10 *
29 Sonar Reasoning Pro Perplexity 6.59 6.40 6.15 6.87 7.27 6.76 6.23 6.39 52
30 Mistral Nemo Mistral 6.53 6.54 5.77 6.92 7.21 6.92 5.50 6.63 12 *
31 Grok Code xAI (Grok) 2.02 2.08 2.00 2.21 1.79 2.42 2.25 2.13 12 *
32 Magistral Small Mistral 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10 *

* Low sample size (<30 evaluations) — ranking may shift with more data

Test AI Models Yourself

Run your own queries and see how models score in real-time with Trust Score evaluation.

Try Search Umbrella →