INDEPENDENT COMPARATIVE BENCHMARK

NDOR vs GPT-5.4 vs Claude Sonnet 4.6

Decision Intelligence compared against general-purpose reasoning models

Independent side-by-side testing showed materially different reasoning behaviour under identical conditions. The same document, the same objective, the same constraints — handled by NDOR and by two general-purpose reasoning models, GPT-5.4 and Claude Sonnet 4.6. The findings below summarise what was different and why it matters when the output is being used to inform a real commercial decision.

FIVE DIMENSIONS

How NDOR reasons differently

For each dimension the benchmark recorded what was identified, how it was reasoned about, and how the recommendation was expressed.

DIMENSION 01

Clause Interaction Analysis

Identify how multiple clauses in a single agreement interact to create combined operational exposure that no individual clause expresses on its own.

NDOR

Mapped how indemnity, limitation-of-liability, and termination-for-convenience clauses combined into a single operational exposure greater than any clause in isolation.

GPT-5.4

Listed each of the three clauses individually with risk commentary, but did not trace the interaction chain that produced the combined exposure.

Claude Sonnet 4.6

Identified the strongest individual clause and discussed its implications, but did not connect it to the other two clauses that compounded the exposure.

DIMENSION 02

SLA Analysis

Evaluate whether service-level commitments are materially enforceable, or whether definitional carve-outs and maintenance-window wording weaken the headline number.

NDOR

Identified that the maintenance-window definition combined with the planned-outage exclusion silently weakened the 99.5% uptime commitment to an effective 96.8% under realistic operating assumptions.

GPT-5.4

Flagged the SLA as containing weaknesses and noted maintenance-window language, but treated it as a routine carve-out rather than a structural loophole.

Claude Sonnet 4.6

Noted the SLA could be tightened and called out the maintenance-window definition, but did not quantify the effective uptime degradation.

DIMENSION 03

Recommendation Quality

Translate findings into prioritised, operationally specific mitigation guidance a counterparty can act on during negotiation.

NDOR

Produced sequenced mitigation guidance: redline ordering, fallback positions, and the precise textual replacements required for each weak clause.

GPT-5.4

Produced thorough recommendations addressing each finding, but without sequencing, fallback paths, or proposed redline text.

Claude Sonnet 4.6

Produced cautious recommendations that surfaced the issues but hedged on the most material structural changes.

DIMENSION 04

Evidence Grounding

Anchor every finding to a specific clause reference and quoted text so the analysis is defensible during stakeholder review.

NDOR

Every finding cited a clause number and an exact textual extract from the source document.

GPT-5.4

Findings frequently cited section numbers but paraphrased the underlying text, weakening defensibility.

Claude Sonnet 4.6

Findings cited the source carefully but referenced clauses by description rather than by structured reference number.

DIMENSION 05

Reasoning Traceability

Surface the intermediate reasoning steps so the conclusion can be audited rather than accepted on trust.

NDOR

Each conclusion exposed the reasoning chain: assumption → clause reference → operational consequence → recommended mitigation.

GPT-5.4

Conclusions presented as flat statements alongside the clauses they referenced, without traceable intermediate reasoning steps.

Claude Sonnet 4.6

Reasoning visible at the paragraph level but not structured into discrete auditable steps.

EVALUATION SUMMARY
“Closer to transaction-advisory and strategic risk-review quality than a standard AI review response.”

— Comparative evaluation summary

METHODOLOGY

How the benchmark was conducted

  • Source document

    A short-form vendor service agreement for technology consulting and software development services, governed by English law — containing SLA, IP ownership, limitation of liability, indemnification, and UK GDPR data-protection clauses.

  • Objective

    Identify risk signals, problematic clauses, structural imbalance, and produce mitigation guidance.

  • Conditions

    Identical prompt, identical document, identical context window for all three systems.

  • Comparators

    NDOR is the system being benchmarked. The two external comparator engines evaluated alongside it were GPT-5.4 and Claude Sonnet 4.6 — each run with the same prompt and the same source material as NDOR.

  • Evaluation

    Scored across five structured dimensions of decision-relevant reasoning, not free-form quality impressions.

WHAT THIS BENCHMARK IS NOT
  • A general intelligence comparison between AI providers.
  • A claim that one model is uniformly better than another.
  • A replacement for qualified professional review on regulated matters.
  • A statement about adversarial AI competition.

NDOR is positioned alongside general-purpose AI — not against it. The benchmark exists to show how a decision-intelligence system reasons differently when the output is destined to inform a real commercial decision.

Run the same kind of analysis on a document of your choice.

The benchmark above used a vendor service agreement. NDOR applies the same structured validation workflow to contracts, models, proposals, and reports.

20 free credits included · No subscription required to begin

This benchmark reflects a controlled, single-document comparison conducted under identical conditions. It is illustrative of how the two approaches differ in reasoning structure on this category of material. Specific findings will vary by document, by objective, and by model version.

© 2026 NDOR. All rights reserved.

NDOR produces analytical observations, not legal advice. Outputs must be reviewed by a qualified professional before reliance.

NDOR is operated by IAA Energy Resources Ltd, a company registered in England and Wales. Company Number: 11583381. Registered Office: 71–75 Shelton Street, Covent Garden, London, England, WC2H 9JQ.

Contact: support@ndor.app