Most recruitment platforms optimise for the recruiter — filter, rank, eliminate. I re-framed Intervyou as candidate-centric. Better signal comes from giving more candidates more ways to show their work, not from filtering harder at the top of the funnel.
01 / 06
How do you design AI behaviour for a product when no UX patterns exist yet?
Intervyou is an AI-powered, candidate-centric hiring platform built to fix a process that has a 50% failure rate. I led product design end to end — from problem framing through to shipping the MVP — with a particular focus on how AI was integrated into the experience.
The only major business process that runs at a 50% failure rate — and is accepted as normal.
Hiring is the only major business process that runs at a 50% failure rate and gets accepted as normal. Companies globally lose around $400 billion a year to bad hires. The interview format itself — high-pressure, conversational, 1:1 — is well documented in occupational psychology research as a poor predictor of on-the-job performance. It selects for one thing: the ability to interview.
The candidates who lose are the ones whose strengths don’t show up in a 45-minute conversation under pressure. Introverts. Anxious candidates. Neurodivergent candidates. Anyone who needs more than one channel to demonstrate what they can actually do. That’s roughly half the population, screened out before the work has even started.
The product brief was clear: build a hiring platform that uses AI to fix this. The design brief was much harder. AI wasn’t a single feature — it had to be woven through role creation, candidate scoring, interview question generation, async video evaluation, and an AI-led interview experience itself. And in 2023, when we started, established UX patterns for AI at that depth simply did not exist.
Three decisions shaped everything downstream.
No process diagram. The interesting work was not the sequence — it was the three moves below. Each one set up the constraints I’d use to make every later decision.
A button labelled “Generate with AI” is a feature. What the AI says, what it refuses to say, how it explains itself, when it defers to a human, and how it surfaces uncertainty — that is behaviour. Behaviour has to be designed before pixels.
It would have been easier to lead with what the AI could do. Instead, I led with how the AI would earn the user’s confidence over time. That meant explainable outputs, assistive language, and humans always in control of decisions.
Where the design work actually happened.
Inverting the brief — designing for the candidate.
Two cross-functional workshops with engineering and product mapped the candidate journey end-to-end and identified where signal was being lost. From there, I built the entire interview model around a multi-stage, multi-format process. Written responses, async video, simulated scenarios, role-specific scenario questions.
No single channel decides anything. A candidate who freezes on video can still demonstrate their thinking in writing. A candidate who writes poorly can show their reasoning verbally. Each stage is a separate piece of evidence, not a gate.
The structure was validated through paper-prototype walk-throughs with candidates drawn from a beta tester pool before any high-fidelity work began.
The recruiter-side benefit is real and counterintuitive: more viable candidates make it to the decision stage, which means the recruiter is choosing between strong options rather than narrowing under uncertainty.
Designing AI behaviour without precedents.
Across the product, AI was doing six distinct jobs: generating roles, generating interview questions, scoring written responses, evaluating async video, surfacing candidate recommendations, and conducting AI interviews. Each one had its own failure modes and its own trust problem. There were no precedents I could lift from established design systems. We were building the precedent.
Rather than start with screens, I started with prompt maps. The process ran prompt-map-first: for every AI feature, I led a workshop with engineering and architecture to define — explicitly — what the AI was allowed to do, what it was not allowed to do, what its inputs were, what its outputs looked like, what tone it spoke in, and what it should do when uncertain. These weren’t ideation documents. They were specs that engineering built against.
Three rules emerged that became the foundation of the system:
- Assistive, never authoritative. AI surfaces, explains, and suggests. Humans always decide.
- Predictable. The same input produces the same shape of output every time. No surprises.
- Explainable by default. Every AI output ships with the reasoning, sources or criteria behind it. If we can’t explain it, we don’t surface it.
I prototyped each AI interaction with rapid throwaway flows — including using AI itself to stress-test edge cases — before any UI was built. That meant we caught failure modes (hallucinated candidate names in summaries, overconfident scoring) at the behaviour layer, where they were cheap to fix, instead of at the UI layer where they would have shipped.
from the work
Trust scales with explainability, not accuracy. A 70% AI that shows its work beats a 95% AI that doesn’t.
Designing for trust — explainability as the load-bearing pattern.
AI candidate scoring is opaque by default. The system gives you “92% match” and expects you to trust it. No senior hiring manager will. And they shouldn’t.
Every AI output in the product had to carry its reasoning with it. The candidate evaluation screen doesn’t say “92% match” — it shows you which stage the candidate performed strongly in, which response drove that signal, and what the gaps are. The recommendation copy is deliberately language-controlled: “consider,” not “hire.” The hiring manager can override at any point, and the AI never makes a final decision. The AI interviewer always discloses that it is an AI, and adapts its questions in real time based on candidate responses to reduce pressure without losing structure.
I tested both surfaces in moderated sessions with hiring managers drawn from pilot teams, and the AI interviewer was tested in unmoderated sessions with candidate participants. Two findings drove late-stage redesigns: hiring managers wanted reasoning surfaced before the recommendation, not after, and candidates trusted the AI more when it acknowledged uncertainty than when it sounded confident. Both insights ended up in the shipped product.
The microcopy work mattered as much as the IA work. The difference between “score,” “match,” “fit,” and “signal” — that’s not a copy decision, that’s a trust decision. Most of my time in the final design phase was spent on language.
from the work
Treat AI as a behaviour to be designed, not a feature to be added.
What the pilot actually changed.
From the closed pilot with early teams. Each number is tied to a specific design decision — not a vanity metric.
Reduction in time-to-hire across pilot teams. Driven by parallel multi-stage evaluation replacing serial interview rounds.
Time saved building a role — ~1 hour collapsed to ~5 minutes.
Better candidate-to-role alignment reported by hiring managers.
Of users praised interface simplicity in pilot feedback.
Of pilot users would recommend Intervyou to a peer.
What I’d do differently.
Where I underweighted the work.
Language. I treated microcopy as a polish-phase concern and found out the hard way that on an AI product, language is where trust gets won or lost. The word “score” implies finality. The word “signal” implies an input to a decision. Pick the wrong one and the entire product feels different. Next time, language design starts at the same time as IA.
Where I’d push earlier.
Prompt maps. On Intervyou, prompt mapping ran in parallel with screen design — they caught up to each other. Next time, prompt maps lead screen design by at least a sprint. Knowing the behaviour first makes the screens fall out naturally; designing the screens first means retrofitting the behaviour.
Behaviour before pixels. Every AI product, from now on.
Intervyou was the proving ground for a way of working with AI that I now bring to any product where a model is making suggestions on behalf of a user. Recruitment was the first place to apply it. It is far from the last.