Question 1

What is Trust Score?

Accepted Answer

Trust Score is a patent-pending AI evaluation framework that scores AI model responses on a 0-10 scale across 7 metrics: readability, factual accuracy, semantic consistency, relevance, style, ensemble disagreement, and human likeness. Unlike synthetic benchmarks, Trust Score evaluates models on real user queries from Search Umbrella.

Question 2

How is Trust Score different from other AI benchmarks?

Accepted Answer

Trust Score evaluates AI models on real queries from real users, not standardized test sets. It measures 7 dimensions of response quality and includes ensemble disagreement, a metric only possible on multi-model platforms that measures cross-model consensus on the same query.

Question 3

What does the Readability / Clarity metric measure?

Accepted Answer

How clear, well-structured, and easy to understand is the response? This metric evaluates logical organization, grammar, formatting, and whether the complexity level matches the query.

Question 4

What does the Factual Accuracy metric measure?

Accepted Answer

Are the facts verifiable and correct? This metric checks claims against known information, identifies potential hallucinations, and evaluates the reliability of cited sources. This is the hardest metric for AI models — and our most discriminating score.

Question 5

What does the Semantic Consistency metric measure?

Accepted Answer

Is the response internally consistent and logically coherent? This metric detects contradictions within the response and evaluates whether the reasoning follows logically from premise to conclusion.

Question 6

What does the Relevance / Focus metric measure?

Accepted Answer

How closely does the response answer the actual query? This metric measures whether the AI stays on topic, addresses the core question, and avoids unnecessary tangents.

Question 7

What does the Style / Tone metric measure?

Accepted Answer

Is the writing style appropriate for the context? A legal question should get a professional response; a creative writing request should get an imaginative one. This metric evaluates contextual appropriateness across domains.

Question 8

What does the Ensemble Disagreement metric measure?

Accepted Answer

When multiple AI models answer the same question, do they agree? This metric — unique to multi-model platforms — measures cross-model consensus. High agreement increases confidence; significant disagreement flags potential issues or highlights where one model found something others missed.

Question 9

What does the Human Likeness metric measure?

Accepted Answer

How natural, conversational, and human-like is the response? This metric evaluates whether the AI communicates in a way that feels authentic rather than robotic, formulaic, or overly corporate.

Question 10

How many AI models does Trust Score evaluate?

Accepted Answer

Trust Score currently evaluates 32 AI models across 8 domains, based on 2,637 real-world evaluations. Data is collected from real users on Search Umbrella and is continuously updated.

Trust Score Methodology

What Makes Trust Score Different

1 Readability / Clarity RC

2 Factual Accuracy FA

3 Semantic Consistency SC

4 Relevance / Focus RF

5 Style / Tone ST

6 Ensemble Disagreement ED

7 Human Likeness HL

Data Collection

Why Ensemble Disagreement Matters

Frequently Asked Questions