The third variableSuccess rate · compounding

The hardest question
in CRO isn’t how to test.
It’s what to test next.

Most teams answer it with opinion, or with a simple impact-and-effort guess. We score every hypothesis across several behavioral models first — so the test at the top of the list is the one most aligned with how your buyers actually decide.

The growth protocol · formula

ΔRPU=Quality×Rate×Success rate

This page owns the third multiplier · §07 shows the full system

Ranked backlogTop 3 ships this sprint

Fogg × Big5 × Drivers

01

Digestive-outcome proof on PDP

Security · Progress

02

Founder video in hero

Trust · Belonging

03

Real-results carousel above fold

Security

04

Bundle suggestions in cart

Progress

05

Sticky add-to-cart on mobile

Convenience

06

Money-back guarantee banner

Security

07

Reorder benefits hierarchy

Clarity

08

Subscription default selected

Anchoring

09

Free shipping bar at top

Anchoring

10

Comparison table vs competitor

Status

11

Press logos band under hero

Authority

12

Reduce checkout to 3 fields

Convenience

Below: why behavioral scoring beats business scoring

Thesis · why behavioral scoring wins

Most teams score the wrong thing.
We score what actually moves buyers.

Impact-and-effort tells you what a test is worth and what it costs. It tells you almost nothing about whether it will actually win.

Three variables drive a CRO program.

How good the test ideas are, how fast you can run them, and how often you pick the right one to run next. The first two get all the attention. The third — success rate — is the one almost nobody systematizes. And it’s the one that compounds.

Business scoring is better than gut. Still wrong.

Most teams that try use impact, confidence, effort. It beats arguing in a meeting. But impact and effort tell you what a test is worth and what it costs. Whether it wins is a question about human behavior — not about business math.

III

So we score on behavior instead.

Every hypothesis runs through several psychological models before it gets a priority. Each model produces a sub-score; the sub-scores combine into one number. The test at the top of the list isn’t the one with the biggest theoretical upside — it’s the one most aligned with how your buyers actually decide.

The whole page in one motionMany models · one score · looping

FoggB = MAP

Big FiveOCEAN

Drivers7 motives

Expert revieweffort + tags

Quantum priors4,127 tests

Iterative loopself-tuning

Hypothesis · #H-01

Digestive-outcome proof on PDP

M·85A·60P·70Fit·78D·87

Priority · composite

Next: how most teams decide what to test

The wrong way · four failure modes

Before a scoring system,
here’s how the queue gets built.

Each of these feels reasonable in the moment. Together, they’re how teams burn dev hours on low-impact tests, miss early wins that should have compounded, and eventually decide “CRO doesn’t really work for us.”

Loudest voice wins

The senior person, or whoever argues hardest, gets their test pushed to the front. Score by org chart, not behavior.

Easy-first

Whatever's quickest to build jumps the queue. The roadmap becomes a list of one-hour wins that don't move anything.

III

Backlog burial

Ideas pile into an unscored list, sit there, age out. The good ones get buried under the new ones.

Competitor copy

A pop-up on a competitor's site goes straight to the top, with no analysis of whether their buyers are anything like yours.

Even teams that upgrade to impact-and-effort are still guessing at the one thing that matters most:will this actually change behavior?

Answered in §04

Next: how a hypothesis gets scored · the centerpiece

The centerpiece · behavioral scoring card

How one hypothesis
gets a real priority.

Watch the same idea pass through four behavioral lenses. Toggle the buyer profile to see sub-scores shift, the composite move, and the whole backlog rerank. That last part is what most teams miss: priority is a function of who’s buying.

Buyer profile · choose who you’re optimizing for

Security-seeker. Wants proof before commitment. Reads everything. Hates ambiguity.

Hypothesis · H-01PDP · above fold

Statement

If we place digestive-outcome proof above the fold on PDPs, then add-to-cart will increase, because this audience pre-commits only when the specific outcome is evidenced.

A · Control

B · Variant

Outcome proof

Composite priority

For · Security-seeker

Of 100

Fogg Behavior

B = M × A × P

Motivation84

Ability62

Prompt71

Big Five (OCEAN)

Trait alignment · top loads

Conscientiousness88

Neuroticism74

7 Core Drivers

Motive · which engine fires

Security92

Belonging58

Fit + Effort + Priors

Brand · build cost · history

Brand88

Effort74

Priors82

Top 5 · reranked by profile

The queue changes when the buyer changes.

Same 20 hypotheses · different weights

#0184

Digestive-outcome proof on PDP

Security · Progress

#0278

Money-back guarantee banner

Security

#0374

Real-results carousel above fold

Security

#0471

Founder video in hero

Trust · Belonging

#0565

Press logos band under hero

Authority

Composite formulapriority = w₁·Fogg + w₂·Big5 + w₃·Drivers + w₄·Fit · weights tuned per buyer profile

Next: the score isn't static. The loop tunes it.

The loop · why the score gets better

The scoring system
tunes itself.

A static priority model is just a more confident guess. Every test result feeds back into the weights — over a quarter, the score stops being a hypothesis and starts being a calibrated prediction of what your specific audience responds to.

The loopContinuous

Every win, loss, or flat test gets fed back. The weights on Fogg, Big Five, drivers, and brand-fit move toward whatever’s actually predictive for this audience — not toward a universal best-practice.

Model calibration · 8 sprintsPredicted → actual · converging

S1 win rate

38%

S8 win rate

84%

Gap (predict vs actual)

closes 24pt

Most CRO programs spend a year arguing about what to test. Ours spends that year learning what to weight.

DRIP — operating principle 03

Next: what this looked like for one client

Proof · one case, nine months

What happens when
the queue actually reranks.

The team had been running impact-and-effort. Many tests, all reasonable, win rate near coin-flip. After rewriting the priority model around behavioral scoring, RPU compounded for nine months straight.

Revenue per user · indexed 100 at M01

Five tests, picked by the model. Four wins, one flat.

WinFlat / loss

RPU lift

9 months · cumulative

Test win rate

Up from 41% prior program

5 / 6

Sprints with a win

One flat, zero losses

0.0×

ROI vs prior quarter

Even at slower test cadence

What changed

The backlog stayed the same. The order didn’t.

01

Reprofiled buyers

Voice-of-customer + checkout exit surveys identified a security-driven core.

02

Re-weighted the model

Conscientiousness, Security driver, and outcome-evidence priors got upweighted.

03

Reranked the existing backlog

Three tests already in the queue moved from #14, #11, #9 → #1, #2, #3.

04

Shipped in that order

Each early win compounded into the baseline before the next test ran.

From the client

“The tests weren’t new. The order was. We’d have run the comparison block in month two and called the whole program flat. Instead we ran outcome-proof first, and everything after it inherited the lift.”

Head of E-commerce

OCEANSAPART · apparel

Case open

Read the full case →

Next: how this fits the whole growth formula

The whole system · where this sits

Pick the right test next,
and the other two compound.

Prioritization isn’t a standalone product — it’s the multiplier in the growth formula. Stronger hypotheses without good prioritization just means a faster roadmap of mediocre tests. The three only work together.

The growth formula

ΔRPU=Quality×Rate×Success rate

You are here · success rate

Var · 01

Test quality

Strong hypotheses, well-designed

4,127 prior tests indexed

Behavioral primitives, not surface ideas

Hypothesis template enforces clarity

§01 · Quantum Databasesee system →

Var · 02

Test rate

Ship more, learn faster

Up to 6 concurrent tests, multi-arm

Audience-isolated, no cross-contamination

Sequential analysis · early-stop logic

§02 · Rapid A/B Testingsee system →

Var · 03You are here

Success rate

Pick the right one to run next

Behavioral scoring · Fogg · Big5 · Drivers

Buyer-profile-weighted compositing

Self-tuning every sprint

§03 · Iterative PrioritizationThis page

The compounding math · why all three matter

Scenario A · Strong queue, low success

0.8×1.0×0.40=0.32×

A team running mediocre tests fast and well-managed. Output stays modest.

Scenario B · Right ideas, wrong order

1.4×1.2×0.45=0.76×

Behaviorally rich hypotheses, but run in a coin-flip order. Half the lift is lost to test #1 being wrong.

Scenario C · The whole formula firing

1.4×1.2×0.78=1.31×

Strong hypotheses, run in the order most aligned with how this audience decides.

Iterative prioritization · the third multiplier

Iterative prioritization · Common questions

Frequently asked questions.

5 questions

01How is behavioural scoring different from ICE / RICE / impact-and-effort?

ICE, RICE, and impact-and-effort tell you what a test is worth and what it costs. They tell you almost nothing about whether it will actually win. Behavioural scoring runs every hypothesis through several psychological models (Fogg, Big Five, the seven drivers, brand-fit, historical priors) and produces a single composite priority — so the test at the top of the list is the one most aligned with how your buyers actually decide, not the one that sounds best in a planning meeting.

02Which behavioural models actually feed the score?

Four: Fogg Behaviour Model (Motivation × Ability × Prompt), Big Five personality traits (we mostly load on Conscientiousness and Neuroticism), our 7-driver framework (Security, Belonging, Mastery, Status, Novelty, Autonomy, Hedonism), and a fit-and-effort layer (brand alignment, build cost, historical priors from 4,000+ tests in Quantum). Each model produces a sub-score; weights combine them into one number.

03What do you mean by 'the model tunes itself'?

Every test result — win, loss, or flat — feeds back into the weights. Over a quarter, the model stops being a static guess and starts being a calibrated prediction of what your specific audience responds to. We've seen win rates rise from ~38% in sprint 1 to ~84% by sprint 8 on the same backlog, just by re-weighting after each outcome.

04Why does priority change when the buyer profile changes?

Because priority is a function of who's buying. The same hypothesis can score 84 for a Security-seeker (evidence-first, risk-averse) and 57 for an Aspiration-driven buyer (identity, status, novelty). The interactive scoring card on this page is the literal interface — toggle the profile, watch the backlog rerank. Most teams pick one test for everyone; we pick the right test for the dominant profile in each segment.

05Where does prioritisation sit in the wider DRIP system?

It's the third variable in the growth formula: ΔRPU = Quality × Rate × Success rate. The Quantum Database covers Quality (4,127 indexed prior tests). Rapid A/B Testing covers Rate (up to 6 concurrent, multi-arm, audience-isolated). Iterative Prioritisation covers Success rate. Strong hypotheses without good prioritisation just means a faster roadmap of mediocre tests — the three only compound when all three are firing.

Iterative prioritization · Talk to us

Want your backlog rescored?

30-minute strategy call. We'll take your existing hypotheses, run them through the behavioural scoring model with your buyer profile, and send back the reranked queue plus the first three sprints we'd run.

€500M+ in additional revenue across 250+ brands
4,000+ A/B tests · 52.6% win rate
10% RPU uplift guaranteed in 6 months — or we work free

Book your free strategy callWe work exclusively with brands doing €300K+/month.

The hardest questionin CRO isn’t how to test.It’s what to test next.

Most teams score the wrong thing.We score what actually moves buyers.

Before a scoring system,here’s how the queue gets built.

Loudest voice wins

Easy-first

Backlog burial

Competitor copy

How one hypothesisgets a real priority.

The scoring systemtunes itself.

What happens whenthe queue actually reranks.

The backlog stayed the same. The order didn’t.

Pick the right test next,and the other two compound.

Frequently asked questions.

Want your backlog rescored?

The hardest question
in CRO isn’t how to test.
It’s what to test next.

Most teams score the wrong thing.
We score what actually moves buyers.

Before a scoring system,
here’s how the queue gets built.

How one hypothesis
gets a real priority.

The scoring system
tunes itself.

What happens when
the queue actually reranks.

Pick the right test next,
and the other two compound.