Put your AI spend on trial with outcomes, proof methods, and 90-day tests. A practical governance framework for SA e-commerce and digital services.

Put Your AI Spend on Trial: Evidence, Not Hype
South African e-commerce and digital service teams are spending real money on AI this December—personalisation engines for peak season, chatbots for support spikes, fraud models for festive shopping, content tools for always-on campaigns. The awkward part? A lot of those investments get approved because the story sounds good, not because the value is proven.
Most companies get this wrong: they start with the tool (“We need an AI agent”, “We need a CDP with AI”, “We need GenAI for content”), then scramble to justify value after the purchase. Dashboards glow. Customer outcomes stay flat. If you’re trying to grow revenue, reduce churn, or lower cost-to-serve, you can’t afford AI theatre.
Here’s a better way to approach this: treat AI value like a court case. Your AI initiative only “wins” when it produces decision-grade evidence that a specific business outcome moved—within a timeframe you agreed upfront. This post adapts the “value architecting” discipline from boardroom governance into a practical framework for AI in e-commerce and digital services in South Africa.
AI theatre is easy. Evidence is what scales.
Answer first: AI programmes fail when teams measure activity instead of outcomes.
AI theatre looks professional: model training updates, glossy demos, chatbot transcripts, impressive accuracy metrics. But boards don’t fund “interesting”; they fund outcomes. And operational teams don’t need another dashboard—they need fewer returns, higher conversion, lower support queues, faster fulfilment cycles.
In South African online retail and digital services, the gap between activity and outcomes gets wider because:
- Data is uneven across channels (web/app/WhatsApp/store), making attribution messy.
- Load shedding, delivery constraints, and fraud patterns create noisy environments where false positives look like “impact”.
- Adoption is the real bottleneck: an agent can exist and still be ignored by customers or staff.
If you want AI to power growth (not just generate slides), govern it like you would any investment: tie it to one business result, define proof, then fund based on evidence.
Start with one outcome (not an AI tool)
Answer first: Every AI initiative should be anchored to a single board-level value node—revenue, margin, churn, cost-to-serve, or cycle time.
AI is a means. Outcomes are the point. When you don’t force a single outcome, your project becomes a “platform story” that’s impossible to judge.
Pick one value node and make it unmissable
Examples that fit e-commerce and digital services:
- Revenue: increase checkout conversion rate by 0.6 percentage points in 90 days.
- Margin: reduce discount dependency by improving product recommendations (measured as AOV margin uplift).
- Churn: reduce subscription cancellations by 8% within two quarters via retention modelling and targeted offers.
- Cost-to-serve: cut “where is my order” contacts by 15% through proactive delivery updates.
- Cycle time: reduce time-to-publish for product pages from 5 days to 2 days with AI-assisted content workflows.
The discipline is the same: choose one outcome for the next 90 days that you’re willing to be judged on.
A practical South African scenario
If you’re a mid-size online retailer heading into January returns season, don’t start with “We need a GenAI bot.” Start with:
Outcome: Reduce contact-centre cost per order by 10% in Q1 without hurting CSAT.
Only then do you explore interventions: better self-service, proactive notifications, improved intent routing, or an AI agent. Technology comes after the outcome.
Map the value so someone can own it
Answer first: If nobody owns the KPI baseline → target → timeframe, AI value won’t show up.
Boards and execs don’t struggle with strategy; they struggle with accountability. A value map fixes that by turning a vague goal into governable commitments.
For each value node, capture:
- KPI: the measurable outcome (eg, churn rate, cost per contact, conversion).
- Owner: the business person accountable (not “IT” as a bucket).
- Baseline: current performance (eg, 2.1% weekly churn).
- Target: what “better” means (eg, 1.9% weekly churn).
- Timeframe: when evidence must land (≤90 days for key assumptions).
- Guardrails: risk appetite (eg, don’t drop CSAT below 4.2/5).
This is where e-commerce AI efforts often break: the data team owns the model, but nobody owns the business lever. The model can be “accurate” while the business outcome stays the same.
Write a testable hypothesis (and a decision rule)
Answer first: If you can’t say “scale, pause, or kill” based on the result, you’re not running an AI investment—you’re running a science project.
A good AI hypothesis has causality, a number, and a clock:
If we deploy an AI-assisted product recommendation module on PDPs for repeat customers, then repeat-customer conversion will increase by 0.4 percentage points within 6 weeks, measured via A/B test, with no increase in returns rate.
Now add the part most teams skip: the decision rule.
- Scale if the lift is ≥0.4pp and guardrails hold.
- Pause if lift is 0.1–0.39pp and investigate friction (UX, latency, inventory).
- Kill or redesign if lift is <0.1pp or returns rise above threshold.
This matters because AI products can drag on for months while teams debate “potential”. A decision rule forces clarity.
Expose assumptions and test them in 90 days
Answer first: The fastest way to waste AI budget is to leave adoption and data quality as “we’ll figure it out later.”
AI value depends on assumptions. Name them early and test them quickly.
Common AI assumptions in e-commerce (and how to falsify them)
- Adoption: “Customers will use the bot.”
- Test: stepped rollout; measure containment rate and deflection to human agents.
- Behaviour change: “Personalised offers will reduce churn.”
- Test: holdout group; measure churn delta over 4–8 weeks.
- Data quality: “Our product catalogue data is clean enough.”
- Test: audit missing attributes and category errors; measure recommendation coverage.
- Operational readiness: “Support agents will trust AI summaries.”
- Test: parallel run; measure handle time and rework rates.
- Integration: “We can connect the model to our commerce stack without latency.”
- Test: load tests during peak hours; measure response time and drop-offs.
Turn each assumption into a named test with an owner, a method, and a ≤90-day window. Treat every assumption like a possible failure point—because it is.
Use a single score to stop portfolio drift
Answer first: AI portfolios fail when leaders can’t see which initiatives have proven value versus promised value.
The RSS piece introduces a board-level approach called the Value Realisation Index (VRI)—a single score (0–100) that only counts decision-grade evidence. For AI in e-commerce and digital services, that’s exactly the type of compression leaders need: one signal that says whether AI is actually delivering.
Here’s a practical way to apply the same VRI logic to your AI portfolio with five dimensions (and weightings you can adjust):
- Strategic alignment (15%): Is the AI initiative directly tied to a core driver like margin, churn, or cost-to-serve?
- Value-tree strength (15%): Are you targeting material nodes (not vanity metrics like “engagement”)?
- Assumption discipline (20%): Are critical assumptions tested within 90 days, with owners?
- Evidence quality (25%): Is there causal proof (A/B, holdouts, matched before–after), not correlations?
- Risk-adjusted outcomes (25%): Are outcomes improving within guardrails (CSAT, fraud loss, returns, compliance)?
The three bands leaders can actually act on
- Green: AI is producing measurable outcome movement.
- Amber: Progress, but evidence is incomplete—tighten tests and assumptions.
- Red: Value isn’t being realised—stop funding momentum and reset.
A useful rule I’ve seen work: if your AI scorecard looks “busy” but revenue, churn, or cost-to-serve is flat, you don’t have an AI strategy—you have reporting.
A quarterly cadence that fits real operating teams
Answer first: AI value becomes visible when you run a simple rhythm: scorecard → 90-day tests → decisions → repeat.
You don’t need a bureaucracy. You need a calendar and a ledger.
The one-page governance set
-
Board/exec scorecard (quarterly)
- VRI score (0–100) with trend
- Top 3 value nodes with baseline → target → delta
- Proven vs assumed split (brutal honesty)
-
90-day evidence plan (rolling)
- Every “weak spot” becomes a test
- Owner, timeframe, method, success criteria
-
Decision & assumption ledger (public internally)
- Fund / scale / pause / kill decisions
- Assumptions proven or disproven
- Capacity moved accordingly
This is where a lot of AI governance goes wrong: leaders keep funding because the team is “working hard.” The cadence forces the harder question: did anything move?
Practical checklist: put your next AI initiative on trial
Answer first: If you can’t fill in these fields in 30 minutes, your initiative isn’t ready for funding.
Use this as a pre-approval template:
- Business outcome: (one)
- Value node KPI:
- Baseline:
- Target delta:
- Timeframe:
- Guardrails: (CSAT, returns, fraud loss, compliance)
- Hypothesis: “If we do X, then Y moves by Z within T”
- Proof method: A/B, holdout, matched before–after, stepped rollout
- Decision rule: scale/pause/kill thresholds
- Top 3 assumptions: adoption, data quality, integration, behaviour change
- 90-day tests: owner + success criteria
If you’re serious about AI powering e-commerce and digital services in South Africa, this template does more than a vendor demo ever will.
Where this fits in the South Africa AI e-commerce story
This post sits in our series on how AI is powering e-commerce and digital services in South Africa, but I’ll be blunt: the winners in 2026 won’t be the teams with the most AI tools. They’ll be the teams that can prove, quarter after quarter, that AI shifted revenue, margin, churn, cost-to-serve, or cycle time—without breaking trust.
Your next step is simple. Pick one AI initiative you’re excited about right now (a chatbot, recommendation engine, GenAI content workflow, fraud model) and put it on trial. Name the value node, set the baseline and delta, define the proof method, and decide what outcome earns more funding.
If your AI investment had to stand up in a boardroom court next quarter, would you bring evidence—or theatre?