AI Model Trust Score Leaderboard
32 models ranked by composite Trust Score across 2,637 real-world evaluations. Click any column to sort. Click a model name for its full profile.
| Rank | Model | Provider | Trust Score | RC | FA | SC | RF | ST | ED | HL | Evals |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Gemini 2.5 Pro | Google Gemini | 8.96 | 8.82 | 8.78 | 9.31 | 9.47 | 9.41 | 7.94 | 8.96 | 16 * |
| 2 | GPT-5 | OpenAI | 8.83 | 8.76 | 8.82 | 9.09 | 9.44 | 8.93 | 7.89 | 8.86 | 60 |
| 3 | GPT-5 Mini | OpenAI | 8.80 | 8.52 | 8.92 | 9.11 | 9.31 | 9.23 | 7.57 | 8.76 | 26 * |
| 4 | GPT-4.1 Nano | OpenAI | 8.74 | 8.71 | 7.67 | 9.10 | 9.45 | 9.07 | 7.69 | 8.82 | 21 * |
| 5 | GPT-5.2 (Thinking) | OpenAI | 8.71 | 8.55 | 8.39 | 8.99 | 9.50 | 8.90 | 7.35 | 8.80 | 185 |
| 5 | GPT-5.2 | OpenAI | 8.71 | 8.50 | 8.54 | 8.96 | 9.48 | 9.02 | 7.45 | 8.74 | 62 |
| 7 | Gemini 2.5 Flash | Google Gemini | 8.67 | 8.68 | 7.78 | 9.06 | 9.45 | 9.28 | 7.16 | 8.86 | 41 |
| 8 | Gemini 3 Flash | Google Gemini | 8.64 | 8.49 | 8.16 | 9.06 | 9.21 | 9.03 | 7.44 | 8.87 | 51 |
| 8 | GPT-4.1 | OpenAI | 8.64 | 8.56 | 8.34 | 8.85 | 8.90 | 8.96 | 7.61 | 8.84 | 50 |
| 10 | GPT-5.1 | OpenAI | 8.52 | 8.39 | 8.43 | 8.65 | 9.13 | 8.52 | 8.04 | 8.53 | 23 * |
| 11 | Claude Sonnet 4.5 | Anthropic | 8.40 | 8.16 | 7.78 | 8.69 | 9.20 | 8.60 | 7.17 | 8.55 | 651 |
| 12 | GPT-5.1 (Thinking) | OpenAI | 8.37 | 8.40 | 8.05 | 8.74 | 9.10 | 8.67 | 7.01 | 8.51 | 79 |
| 13 | GPT-4o | OpenAI | 8.29 | 8.31 | 7.26 | 8.89 | 8.46 | 8.69 | 7.05 | 8.49 | 27 * |
| 14 | Gemini 3 Pro | Google Gemini | 8.26 | 8.13 | 7.64 | 8.57 | 9.03 | 8.53 | 6.93 | 8.40 | 479 |
| 15 | Jamba Large | AI21 | 8.20 | 8.05 | 6.10 | 8.55 | 8.82 | 8.64 | 7.45 | 8.14 | 11 * |
| 16 | Claude Opus 4.6 (Adaptive) | Anthropic | 8.16 | 7.71 | 7.57 | 8.38 | 8.95 | 8.34 | 5.47 | 8.33 | 117 |
| 17 | Grok 4 (Reasoning) | xAI (Grok) | 8.08 | 8.10 | 7.51 | 8.47 | 8.83 | 8.42 | 7.34 | 8.23 | 142 |
| 18 | Grok 4 | xAI (Grok) | 8.02 | 7.86 | 7.72 | 8.36 | 8.82 | 8.49 | 7.49 | 8.09 | 39 |
| 19 | Grok 3 Mini | xAI (Grok) | 7.76 | 7.73 | 7.50 | 7.95 | 8.29 | 8.92 | 7.15 | 7.61 | 14 * |
| 20 | GPT-5 (Generic) | OpenAI | 7.72 | 7.87 | 7.21 | 8.17 | 8.37 | 7.87 | 6.37 | 8.00 | 41 |
| 21 | Sonar Pro | Perplexity | 7.70 | 7.77 | 7.01 | 8.13 | 8.37 | 8.04 | 6.69 | 7.84 | 67 |
| 22 | Grok 4.1 (Reasoning) | xAI (Grok) | 7.68 | 7.68 | 6.96 | 8.04 | 8.38 | 8.05 | 6.61 | 7.88 | 99 |
| 23 | Mistral Small 3.2 | Mistral | 7.61 | 7.40 | 6.54 | 7.91 | 8.17 | 8.05 | 6.62 | 7.50 | 21 * |
| 24 | Claude 3.7 Sonnet | Anthropic | 7.60 | 7.70 | 7.02 | 8.15 | 8.19 | 7.72 | 6.62 | 7.73 | 38 |
| 24 | Claude Sonnet 4.5 (Thinking) | Anthropic | 7.60 | 7.40 | 6.85 | 7.90 | 8.33 | 7.80 | 6.33 | 7.58 | 35 |
| 26 | Sonar | Perplexity | 7.18 | 7.18 | 3.94 | 7.58 | 7.63 | 7.68 | 6.05 | 7.26 | 19 * |
| 27 | Jamba Mini | AI21 | 7.12 | 7.00 | 5.00 | 7.38 | 7.69 | 7.47 | 6.19 | 6.99 | 16 * |
| 28 | Claude Sonnet 4 | Anthropic | 7.05 | 7.03 | 6.61 | 7.25 | 7.40 | 7.10 | 6.77 | 7.08 | 10 * |
| 29 | Sonar Reasoning Pro | Perplexity | 6.59 | 6.40 | 6.15 | 6.87 | 7.27 | 6.76 | 6.23 | 6.39 | 52 |
| 30 | Mistral Nemo | Mistral | 6.53 | 6.54 | 5.77 | 6.92 | 7.21 | 6.92 | 5.50 | 6.63 | 12 * |
| 31 | Grok Code | xAI (Grok) | 2.02 | 2.08 | 2.00 | 2.21 | 1.79 | 2.42 | 2.25 | 2.13 | 12 * |
| 32 | Magistral Small | Mistral | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 10 * |
* Low sample size (<30 evaluations) — ranking may shift with more data
Test AI Models Yourself
Run your own queries and see how models score in real-time with Trust Score evaluation.
Try Search Umbrella →