Chapter 06Sourcing and Evaluation5 min read

End-to-end interview process

What you’ll learn

The full interview loop, a technical screen that survives AI tooling, and customer-simulation design. Synthesized from how Palantir, OpenAI, Anthropic, Sierra, Ramp, and Harvey actually run loops in 2025 and 2026.

The interview loop (3–4 weeks)

The reference interview loop
StageDurationOwnerGoal
Recruiter screen30 minRecruiterMotivation, ‘why FDE not SWE,’ AI baseline.
Hiring manager screen45–60 minHMBackground depth, customer-facing experience, 1 product judgment scenario.
Technical assessment90 min live OR 1–4 hr take home + 60 min walkthroughIC engineerReal codebase work with AI tools.
Onsite, Decomposition / deployment scenario60 minSenior FDEOpen-ended customer problem, ambiguity.
Onsite, Customer simulation / role-play45–60 minPM or sales partnerCommunication under pressure.
Onsite, System / solution design60 minStaff engProduction AI architecture for a real constraint.
Onsite, Values / ‘bar raiser’45 minCross-functionalMission alignment, ownership stories.
References + offer1 weekRecruiter + HM3 references min: manager, peer, customer/cross-functional.

Variants worth knowing

  • Palantir FDSE: still uses HackerRank online assessment + decomposition + system design + re-engineering a 250–500-line codebase. AI use strictly prohibited in their screen.
  • OpenAI FDE: AI-systems take-home (RAG, retries, eval design), then a video walkthrough, then a customer-style presentation onsite.
  • Anthropic Applied AI / FDE: 90-min CodeSignal take-home with progressively harder levels (spec → refactor, not Leetcode) → pair programming → system design → values round (where most candidates fail).

The technical screen debate, resolved

The 2025–2026 evidence converges hard:

What works in 2026, by interview format
Format2026 verdictUse for FDE?
Leetcode / algorithmicDead for FDE. GPT-4 / Claude solve mediums in <1 minute.Skip, or use only as a 30-min syntax sanity check.
Take-homes (traditional)Compromised. ‘One-shotted by AI in 7 minutes.’Only with mandatory follow-up walkthrough.
Real-codebase + AI-allowed live workThe new gold standard. Canva (Sept 2025) requires it; Augment, Shopify, Meta, Rippling allow it.Default for FDE technical screens.
Pair programming on a real bugHigh signal, low scale.Excellent for onsite.
Re-engineering 250–500 LoCHigh signal.Excellent, tests reading + judgment.
Customer-presentation take-homeOpenAI uses this.Excellent for FDE, directly simulates the job.
We want to see the interactions with the AI as much as the output of the tool.
Brendan Humphreys, Canva CTO·September 2025

90 minutes in a real codebase, AI tools fully enabled (Cursor / Claude Code / Codex / OpenRound), with screen recording and live observation:

  • (10 min)Candidate reads README, asks clarifying questions (‘Discovery’).
  • (15 min) Candidate writes a plan before any code.
  • (45 min) Candidate ships a real ticket. AI use mandatory and observed.
  • (20 min)Walkthrough: ‘Why this approach? What would you have done without AI? What would break in production?’

Run this screen on OpenRound

Test how FDEs ship with AI.

OpenRound tests how FDE candidates work on technical problems with business context, while giving them access to AI tools.

Customer simulation

Hiring manager or PM plays a named persona (e.g., ‘VP of Operations at a 40,000-employee logistics company’). Pre-brief candidate 24 hours in advance.

Three scale meals:

  • Discovery: candidate leads kickoff, must scope before solving.
  • Pressure / scope dispute:interviewer escalates: ‘Your timeline is unacceptable.’
  • Live failure:‘Your deployment broke during our morning all-hands. 200 people just saw the error.’

What to watch for:

  • Naming the situation before solving (‘I hear you. This is impacting your team and I'm taking it seriously’).
  • 2–3 diagnostic questions before any solution.
  • Ownership language (‘I will have a diagnosis to you by EOD’).
  • Never blames the customer's systems even when those are the problem.
  • Says no with options.

How interview design differs by archetype

  • AI lab:make the technical screen an end-to-end evals + agent project (e.g., ‘here's a flaky agent, fix it, write evals, ship a prompt change with a metrics dashboard’). Customer sim: enterprise security officer.
  • Data lab: Palantir-style data-decomp on messy real data + integration debugging. Customer sim: frustrated domain expert.
  • App / agent startup: ship a real PR in their codebase end-to-end in 90 minutes with AI tools. Customer sim: live demo to founder-CEO of a 50-person customer who wants 12 things in 4 weeks. Heavy weight on velocity and product taste.
  • Dev tools:‘show me your AI-augmented workflow’ + 2-hour live debugging in a real OSS-scale codebase.

Key takeaways

  • Leetcode is dead for FDE. The 2026 gold standard is a 90-minute real-codebase screen with AI tools fully enabled and observed.
  • Customer simulation has three escalation steps: discovery, scope dispute, live failure. Watch if they scope before solving.
  • Hire bar: ≥3 average; ≥3 on Judgement and AI Collaboration. These two dimensions correlate strongest with FDE on-the-job success.