The story of Solomon’s judgment is one of the oldest decision-making frameworks in recorded history. Two women claim the same child. No witnesses, no evidence, no algorithmic output that could resolve the dispute. Solomon’s solution — propose dividing the child, then watch who objects — is a masterclass in what optimization cannot do: use the decision itself as the test that reveals the truth.
I’ve been thinking about this story a lot while working on AI recommendation systems. We’ve gotten very good at optimization. We can recommend content, predict churn, personalize experiences, and rank options with impressive accuracy. What we haven’t gotten good at — and what I’m not sure AI will ever be good at — is recognizing when the optimization is solving the wrong problem entirely.
The Algorithm vs. the Test
A recommendation algorithm optimizes for a signal. Click-through rate, completion rate, return visits, explicit ratings — whatever you tell it to optimize, it will optimize. The problem is that the signals we can measure are often proxies for the outcomes we actually care about, and at some point the proxy diverges from the outcome.
I’ve seen this play out in digital content platforms repeatedly. A reading plan recommendation algorithm optimizes for completion rates. It gets good at predicting which plans users will finish. It starts recommending plans that feel familiar, comfortable, and achievable — because those are the ones that get completed. Completion rates go up. The metric is green. But users who choose plans that “mismatch” their stated preferences — the challenging, unfamiliar ones — show better long-term engagement. The algorithm optimized for completion and selected against growth.
Solomon’s test is the thing the algorithm can’t run. He couldn’t optimize his way to the truth. He had to create a condition that would reveal the truth through the parties’ responses. That kind of judgment — knowing that the right test will surface the right answer — is what AI cannot replicate, and what product leaders need to understand as an irreplaceable capability.
The Limits of AI Optimization
There’s a specific category of decisions where optimization fails systematically, and it maps precisely to the conditions of Solomon’s judgment: when the right answer requires understanding something about a party’s underlying motivation or stake in the outcome that isn’t visible in any behavioral signal.
In product terms: the algorithm can tell you what users do. It cannot tell you what users are trying to become. It can tell you what content users complete. It cannot tell you whether completion served their actual goal. It can tell you what users click. It cannot tell you whether the click reflects genuine engagement or habit, genuine interest or boredom-driven curiosity.
These are not problems of insufficient data. They’re problems of category — you cannot optimize your way to answers about meaning, motivation, and stake without reducing those human realities to signals that don’t actually capture them. This is what HBR’s research on AI decision limits has consistently found: AI excels at decisions that can be fully specified by their optimization criteria. It fails at decisions where the criteria themselves are contested or where the right answer depends on understanding what’s at stake for the parties involved.
Creating Space for Judgment
The practical implication for product teams isn’t “stop using optimization.” It’s “design your system so that optimization handles what optimization is good at, and humans handle what optimization can’t reach.”
This requires being explicit about the decisions your AI system is making and auditing them for the category of problem they represent. Can the decision be fully specified by a measurable outcome? Optimization is appropriate. Does the decision depend on understanding a user’s underlying motivation, growth trajectory, or stake in the outcome? Optimization needs a human checkpoint.
In practice, this means building escalation paths. Decisions that an algorithm can handle with confidence: automated. Decisions that carry high downstream consequence and depend on judgment the algorithm can’t access: escalated to a human. The challenge is designing the system so that the escalation happens before the consequence rather than after it.
The Wisdom Gap in AI Product Teams
Most product teams have become very good at evaluating AI performance — accuracy rates, precision, recall, business metric lift. What they haven’t systematically developed is the capacity to evaluate AI judgment — the ability to recognize when the system is optimizing confidently toward the wrong objective.
This is the wisdom gap. Wisdom, in the traditional sense, isn’t just knowledge or capability — it’s knowing when to apply which capability, and knowing when the right answer can’t come from a formula at all. Solomon wasn’t impressive because he had access to more information than anyone else. He was impressive because he understood that the right test would reveal what no available information could.
Building that capacity into a product team means creating explicit processes for questioning whether AI systems are optimizing for the right things — not just whether they’re optimizing well. It means reviewing decisions at the system level, not just the feature level. It means asking, regularly: what would the right answer look like if we couldn’t use a metric to find it? If the answer changes when you remove the metric, the metric is the wrong one.
The Practical Applications
Three specific places where Solomon’s framework changes how I evaluate AI product decisions:
Recommendation systems: Don’t just optimize for completion or engagement. Build qualitative research into your cadence that asks users whether the AI-recommended path served their actual goal. The behavioral signal and the goal-based signal often diverge. The divergence is where the product insight lives.
Personalization: The most effective personalization isn’t always the most comfortable personalization. Before concluding that an algorithm is working because engagement is up, ask whether the algorithm is serving users’ stated goals or optimizing toward the path of least resistance. Users often engage more with what’s familiar than with what’s growth-oriented. The engagement metric can’t distinguish between the two.
High-consequence decisions: Any AI-assisted decision with significant downstream consequences for a specific user — content restrictions, account actions, eligibility determinations — needs a human checkpoint. Not because the AI will always be wrong, but because the cost of confident wrongness in these cases is high enough that the judgment of a human who understands what’s at stake is worth the operational overhead.
Your Turn: Apply This Today
Build the judgment layer into your AI product process:
- Audit your AI system’s decisions for decision category. List the top 10 decisions your AI system makes. For each one, ask: can this decision be fully specified by its optimization criteria? If yes, automation is appropriate. If the right answer depends on understanding user motivation or stake, build a human checkpoint.
- Separate your “proxy metrics” from your “outcome metrics.” Identify which metrics your AI optimizes directly and which outcomes you actually care about. Map the relationship between them. Where the proxy and the outcome diverge, you have a place where optimization is working against you.
- Build qualitative research into your AI evaluation cadence. Once per quarter, interview users whose behavior your AI system has most significantly influenced. Ask whether the AI-driven experience served their actual goal. The divergence between behavioral signals and goal-based answers is where your most important product insights live.
- Design a “Solomon test” for your highest-stakes AI decisions. For the AI-assisted decisions with the highest downstream consequence, design a test that would reveal whether the AI’s recommendation was right — not just whether users accepted it. Acceptance and correctness are not the same.
- Create an escalation path for judgment-dependent decisions. Identify the category of decisions that require understanding user motivation or stake. Build an explicit escalation path so that those decisions reach a human before the consequence rather than after. Make it part of your system design, not an emergency procedure.
- Hold a “what if we removed the metric?” review annually. For each of your AI system’s optimization targets, ask: what would the right answer look like if we couldn’t use this metric? If the answer changes significantly, the metric needs to be reconsidered. This is the most uncomfortable product review you will have and the most valuable.
The judgment-vs-optimization tension shows up across AI product decisions — the Kahneman System 1 paradox addresses the cognitive load dimension of the same problem, and why product decisions are never just product decisions explores the ethical dimension of designing AI systems that optimize toward the right ends.
Building AI systems and trying to design judgment into the process rather than optimizing it away? I consult with product teams on AI product strategy, human-AI decision frameworks, and building systems that know when optimization is the right tool — and when it isn’t. Let’s talk.
