What Nobody Tells You About Product-Led Growth in Faith Tech

Product-led growth has become the dominant go-to-market playbook for B2B software. Let the product do the selling. Freemium drives acquisition. Self-serve onboarding reduces sales cost. Expansion revenue follows natural usage. The model works extremely well — in markets where users have discretionary time, personal tech comfort, and the organizational freedom to adopt tools on their own.

Faith tech is not that market. And the teams that have tried to apply the PLG playbook to churches, ministries, and faith-based organizations have learned some expensive lessons about what makes this market different. I’ve been in this space — building digital products for Sermons4Kids and SermonCentral, serving hundreds of thousands of ministry leaders — and the lessons from those attempts have fundamentally changed how I think about growth in faith-based contexts.

Why the Standard PLG Playbook Doesn’t Transfer

PLG assumes a user who can make adoption decisions independently. In most church and ministry contexts, this assumption breaks immediately. The person who would use a curriculum platform daily is a volunteer children’s ministry director who has roughly seven minutes to prep and no budget authority. The person who controls the budget is a pastor or board member who will never use the product directly. The person who influences technology decisions in the congregation may be an IT volunteer who serves one Sunday a month.

This multi-stakeholder structure — where the user, the buyer, and the influencer are almost never the same person — means that the classic PLG motion of “user falls in love with product, user expands usage, user becomes internal champion who drives purchase” doesn’t work. The user can fall in love with the product and have zero ability to generate a purchase. The buyer can approve a purchase for a product they’ve never touched. The influencer can kill a deal based on concerns that have nothing to do with product quality.

Standard PLG metrics — activation rate, time-to-value, viral coefficient — measure the wrong things in this context. The bottleneck in faith tech adoption isn’t activation. It’s trust between the product and the organizational decision-maker who will never be a power user.

What Trust Means in Ministry Contexts

Ministry organizations run on a different trust architecture than corporate buyers. A mid-market B2B buyer evaluating software is assessing: does this solve the problem, is the price justified, and what does implementation look like? A church or ministry evaluating a platform is assessing all of those things plus: do we trust this organization with our people?

That last question is not a minor addition. Ministry organizations are deeply responsible for the spiritual and relational wellbeing of the communities they serve. They’re not just buying a productivity tool — they’re choosing a technology partner that will touch their congregation’s formation, communications, or discipleship. The stakes of a bad fit are perceived as higher than a bad software choice in a corporate context. The evaluation process reflects that.

In practice, this means that community signals matter more than product signals in faith tech adoption decisions. “Our denomination uses this” carries more weight than “this has 4.8 stars on G2.” “I heard about it at the pastors’ conference” moves faster than any inbound funnel. Trust flows through relationship networks, not through product-led virality. The distribution model that actually works in this market looks more like word-of-mouth in a high-trust community than like a self-serve freemium funnel.

The Metrics That Actually Matter

The metrics I’ve found most predictive in faith tech are not the standard PLG metrics. They are:

Volunteer completion rate. Not administrator completion, not pastor usage — the metric that predicts retention in most faith tech products is whether the volunteers who rely on the product can complete their core task without failure or confusion. Volunteers have the lowest tolerance for friction and the highest likelihood of abandoning a tool that makes their service harder. If volunteers stop using it, administrators switch platforms.

Community recommendation rate. How often do your users recommend the product in contexts you’re not involved in — small group leader networks, denominational meetings, online ministry forums? This is the faith tech equivalent of the viral coefficient, but it operates on a much slower, higher-trust basis. One strong recommendation from a respected ministry leader is worth more than a hundred self-serve trials.

Mission alignment signal. The organizations that pay premium prices in faith tech do so because they believe the product advances their mission, not just because it’s functionally superior. Tracking whether your users articulate a mission connection — in support conversations, in community forums, in reviews — tells you whether your positioning is landing at the level that drives willingness to pay.

What Faith Tech Growth Actually Looks Like

The growth motion that works in faith tech is closer to community-led growth than product-led growth. The product still has to work — a platform that fails volunteers will never earn the community recommendation that drives adoption in this market. But great product quality is the floor, not the ceiling. The ceiling is community trust.

This means investing in community infrastructure that PLG playbooks typically skip: presence at denominational conferences, partnerships with seminary programs, engagement in the online forums where ministry leaders actually talk to each other. These are slow, relationship-dependent channels that don’t scale the same way a self-serve freemium funnel does. They also don’t turn off when the budget cycle changes, because they’re built into how the community thinks about the product.

The teams that have built durable products in this space — and I’ve watched several attempts — are the ones that understood the trust architecture of their market and built their growth motion around it, rather than applying a playbook from a different market and wondering why the numbers didn’t work. The FaithTech community’s research on technology adoption in ministry contexts is the most useful public resource I’ve found for teams trying to understand this market’s distinct dynamics.


Your Turn: Apply This Today

If you’re building for faith-based organizations — or any high-trust, mission-driven market — here’s how to adapt your growth thinking:

  • Map your actual decision-making chain. For your target organization, identify: who uses the product daily, who controls the budget, and who influences the decision. Are they the same person? Almost never in ministry contexts. Design your acquisition, activation, and expansion motions for each role separately — not for an idealized single-user who does all three.
  • Reframe your freemium strategy around trust-building, not feature gating. In faith tech, free tiers work best when they demonstrate mission alignment and build organizational trust — not when they gate features to drive upgrade. Ask: what free experience would make a ministry leader trust us enough to bring us to their board? Design toward that, not toward the feature paywall.
  • Instrument volunteer completion rate separately from administrator metrics. If you’re not tracking whether the front-line volunteers who use your product daily can complete their core workflow successfully, you’re missing your most important retention leading indicator in ministry-facing products. Add it to your weekly review.
  • Build your community presence before you need it for growth. Attend the conferences, engage in the forums, build relationships with denominational leaders before you’re asking them to recommend your product. Community-led growth in high-trust markets requires relational investment that can’t be turned on urgently. Start now.
  • Measure mission alignment explicitly. Add one question to your onboarding or check-in flow: “How does this platform support what your ministry is trying to accomplish?” The answers will tell you whether your positioning is landing at the mission level — and they’ll give you the language to communicate that positioning to buyers who’ve never used the product.
  • Design your onboarding for the volunteer, not the administrator. In most faith tech products, the product review is done by an administrator and the daily experience belongs to a volunteer. Build your onboarding and help content for the least technical, most time-pressured user in the chain — the volunteer who has seven minutes on Sunday morning. If they succeed, the administrator renews.

The trust dynamics in faith tech connect to broader themes about designing for time-constrained users — what children’s ministry taught me about product simplicity goes deep on what the seven-minute volunteer experience reveals. And why faith tech is becoming a real category covers the market-level opportunity behind these dynamics.

Building or investing in faith tech and trying to figure out a growth model that actually works in ministry contexts? I consult with digital ministry organizations and faith tech teams on product strategy, growth model design, and building for the unique trust architecture of the faith-based market. Let’s talk.

Your AI Product Has a Trust Problem Before It Has a Performance Problem

Every AI product team I’ve worked with or advised has spent significant time on the performance problem: how accurate is the model, how fast does it respond, how well does it handle edge cases? These are legitimate engineering concerns. But I’ve watched technically impressive AI products fail in the market for a reason that none of those metrics capture: users didn’t trust them.

Trust is the prerequisite for adoption. And trust in AI products breaks in ways that are categorically different from trust in traditional software — faster, harder to repair, and with a long tail of behavioral effects that don’t show up in your engagement dashboard.

How AI Trust Breaks Differently

When traditional software fails, users understand the failure. A button doesn’t work. A page doesn’t load. The app crashes. These failures are frustrating, but they’re legible — users know what happened and can calibrate their expectations accordingly.

When AI fails, users often don’t know what happened. The AI gave a confident-sounding answer that turned out to be wrong. The recommendation seemed personalized but was clearly off. The summary missed something important and presented the gap with the same authority as the accurate content. Users can’t distinguish between the AI being right and the AI sounding right. That illegibility is what makes AI trust failures so destructive.

A single high-visibility AI error can undo months of accurate performance. Users who’ve been trusting the AI unconsciously — accepting its outputs without verification, integrating them into their workflows — suddenly question every prior output. The trust collapse is retroactive. And it’s much harder to rebuild trust after an AI failure than after a traditional software failure, because the user’s underlying question — “how do I know when to trust this?” — doesn’t have a clean answer.

The Trust Gap Product Teams Miss

Most product teams measure trust indirectly — through engagement, retention, NPS. These metrics capture trust effects, but they lag the trust event by weeks or months. By the time your NPS drops, the trust damage was done several product cycles ago.

The trust gap I see most consistently is between what users expect the AI to know and what it actually knows. Users quickly form a mental model of what the AI understands about them — their context, their history, their preferences. When the AI behaves in a way that violates that mental model, trust breaks. Not because the AI was necessarily wrong, but because it revealed that its understanding of the user is shallower than the user assumed.

This is a design problem as much as an engineering problem. The AI’s communication of its own uncertainty — what it knows, what it’s inferring, what it’s guessing — is as important to trust as the accuracy of the output itself. Products that communicate their AI’s confidence level clearly allow users to calibrate appropriately. Products that present all AI outputs with equal confidence train users to either overtrust everything or distrust everything. Neither is the outcome you want.

Building Trust Before You Build Performance

The framing shift that changes how you build AI products: trust is a design requirement, not a performance outcome. You don’t earn trust by making the AI more accurate and hope trust follows. You design for trust explicitly — and then let performance maintain it.

What designing for trust looks like in practice:

Transparency about what the AI can and can’t do. Set user expectations at the point of first contact, not after the first failure. Users who understand the AI’s capabilities and limitations before they rely on it calibrate their trust appropriately. Users who discover limitations through failure recalibrate toward distrust.

Visible confidence signals in the UI. When the AI is highly confident, say so — implicitly through clean, direct output. When the AI is uncertain, signal that too — through hedged language, source attribution, or explicit “I’m not sure about this” framing. Users who can see confidence levels can trust appropriately rather than uniformly.

Recoverable failures. Design for the moment when the AI is wrong. How does the user know? How do they correct it? What does recovery cost them? AI products where errors are hard to detect and expensive to fix train users to verify everything, which eliminates the productivity benefit. AI products with visible, cheap error recovery build the confidence for users to rely on the AI without constant verification.

Consistent, predictable behavior. Trust requires predictability. An AI that behaves differently on similar inputs — even if both outputs are acceptable — trains users toward unpredictability anxiety. They start over-supervising the AI because they don’t know when to trust it. Consistency in behavior, tone, and output style is trust infrastructure. Google’s People + AI Research guidebook on AI interaction design is the best public resource I’ve found on trust-centered AI design principles.

The Trust Measurement Problem

If you’re not measuring trust directly, you’re flying blind on your most important AI product metric. Engagement and retention are trust effects. They tell you that trust is breaking down after it’s already happened. You want leading indicators.

The leading indicators I track: AI override rate (how often users edit or reject AI outputs, separated from how often they accept without modification), re-query rate (how often users immediately follow an AI output with a corrective or clarifying query), and qualitative signals from support tickets about AI confusion or error. These don’t replace engagement metrics — they explain them, and they surface trust issues early enough to do something about them before they show up in churn.


Your Turn: Apply This Today

Build trust into your AI product development process before the first failure forces you to repair it:

  • Audit your AI product for confidence signaling. Walk through your product as a new user. Can you tell when the AI is highly confident vs. uncertain? If all outputs look the same, you’re training users to either overtrust or uniformly distrust. Add confidence signals before the next major release.
  • Map your AI failure modes and their recovery cost. For each AI feature, ask: when this is wrong, how does the user know? How expensive is the correction? The higher the recovery cost, the more supervision users will apply — and the lower the actual productivity benefit. Design cheap recovery paths into every high-consequence AI feature.
  • Add AI override rate to your product metrics dashboard. Instrument how often users accept AI outputs without modification vs. edit, override, or immediately re-query. Track it weekly. If the override rate is near zero, users may be overtrusting. If it’s very high, users don’t trust the AI enough to rely on it. Calibrated override rates (somewhere in between) indicate healthy trust.
  • Conduct a “first failure” user research session. Recruit users who have experienced a visible AI error in your product. Interview them about what happened to their trust and usage behavior after the failure. The pattern in their responses will tell you whether your product’s trust recovery design is working or broken.
  • Write your AI product’s “trust contract” with users. One paragraph: what your AI knows, what it can do reliably, what it can’t do reliably, and how to tell the difference. Share it in your onboarding. Users who understand the trust contract calibrate appropriately. Users left to discover it through failure don’t.
  • Run a “trust stress test” before every major AI feature launch. Deliberately trigger AI failures in a testing session with representative users. Observe their reactions. How long does it take them to recover trust? Do they change their behavior after the failure? If the trust damage is severe or persistent, redesign the failure experience before launch.

The trust problem is closely connected to the cognitive load dimension — Kahneman’s System 1 automation paradox explains why users overtrust confident-sounding AI outputs, and the Solomon test for AI decision-making addresses when human judgment needs to stay in the loop regardless of AI confidence.

Building an AI product and noticing that trust is the bottleneck more than performance? I consult with product teams on AI trust design, confidence signaling, and building the user research processes that surface trust problems before they become retention problems. Let’s talk.

The Product Roadmap Isn’t the Strategy: Why Confusing the Two Costs You More Than You Think

I’ve sat in more roadmap reviews than I can count. Quarterly planning sessions, OKR check-ins, leadership presentations where someone projects a slide with colored bars stretching 18 months into the future. Everyone nods. The roadmap gets approved. And then, somewhere between approval and execution, the strategy quietly disappears.

The roadmap was never the strategy. It was a schedule. And the most expensive mistake product leaders make is treating these two things as the same document.

What a Product Roadmap Actually Is

A roadmap is a sequenced list of work your team intends to do. It answers questions like: what are we building, when are we building it, and who is building it? These are important questions. They’re operational questions. A well-run roadmap process makes execution more predictable and stakeholder communication more coherent.

A product strategy answers different questions entirely: why are we building these things and not those things? What problem are we uniquely positioned to solve? What user need are we betting on that competitors are missing? What do we believe about the market that, if true, makes our product valuable — and if false, means we’re building the wrong thing?

A roadmap without a strategy is just a backlog with dates attached. It tells you what you’re doing but not whether you’re doing the right things. And the absence of strategy shows up not in the roadmap review, but six months later when you’ve shipped everything on the plan and the metrics haven’t moved.

Why Product Teams Conflate the Two

The conflation happens for understandable reasons. Roadmaps are concrete and satisfying. Strategy is abstract and contested. A roadmap gives stakeholders something to react to — they can add items, move timelines, reprioritize features. Strategy conversations are harder to close because they require alignment on beliefs about the future, not just preferences about the present.

There’s also an incentive problem. Product leaders are often evaluated on delivery — did the team ship what was on the roadmap? Evaluating whether the roadmap was the right roadmap to begin with requires a longer time horizon and a harder question than most review processes are set up to answer. So teams optimize for the thing they’re measured on, which is the schedule, not the strategy.

The result is a product organization that is excellent at executing and perpetually uncertain about whether it’s executing toward the right destination. Everything ships on time. The product never quite gets to where it was supposed to go.

What Strategy Actually Requires

Strategy requires three things that roadmaps don’t contain:

An explicit bet on user need. Not a description of what users want today, but a theory about what they will value tomorrow that competitors haven’t yet recognized. The best product strategies I’ve seen are specific enough to be falsifiable — if this user need doesn’t materialize at the scale we’re predicting, the strategy fails. That specificity is uncomfortable, which is why most strategy documents avoid it.

A theory of competitive differentiation. Why will users choose your product over the alternatives — not because of features, but because of something structural about your position? Network effects, proprietary data, distribution advantages, switching costs, brand trust in a specific community. Features get copied. Structural advantages don’t. If your differentiation argument is “we have more features,” you don’t have a strategy yet.

A clear account of what you’re not building. This is the hardest part. Strategy is as much about the work you decline to do as the work you commit to. A roadmap that says yes to everything a stakeholder requests isn’t a strategy — it’s a wish list. The moment when a product leader says “that’s not on our roadmap because it doesn’t advance our strategy” and can explain why, clearly and without apology, is the moment when strategy is actually operating. Lenny Rachitsky’s breakdown of what product strategy actually is is the clearest treatment of this distinction I’ve read.

Making Strategy Visible in the Roadmap Process

The fix isn’t to abandon roadmaps. It’s to make the strategy explicit before the roadmap is built — and to test every roadmap item against the strategy before it gets approved.

In practice, this means the roadmap review has a different opening question. Instead of “here’s what we’re building and when,” the opening is “here’s the strategy we’re executing, here’s how each major initiative advances it, and here’s what we chose not to build because it would have diluted our strategic focus.” The roadmap becomes evidence for the strategy, not a substitute for it.

This sounds simple. It’s a significant cultural shift. It requires leadership that is willing to hold the strategy conversation before the stakeholder requests arrive — and willing to defend the strategy against the pressure to add scope. Most organizations haven’t built that muscle. The ones that have consistently build better products than the ones that haven’t, independent of engineering quality or design talent.


Your Turn: Apply This Today

Diagnose whether your roadmap is doing strategy’s job — and start fixing it:

  • Write your product strategy in three sentences before your next roadmap review. Sentence one: the specific user problem you’re uniquely positioned to solve. Sentence two: why your product is structurally better positioned to solve it than alternatives. Sentence three: what you’re explicitly not building this year and why. If you can’t write all three without hedging, you don’t have a strategy yet.
  • Audit your last roadmap against these three sentences. For each item that shipped in the last two quarters, ask: does this directly advance the strategy? If fewer than 70% of items have a clear answer, your roadmap isn’t executing a strategy — it’s executing requests.
  • Add a “strategic rationale” column to your roadmap. For every item, require a one-sentence explanation of which part of the strategy it advances. Items that can’t be explained in one sentence without reaching shouldn’t be on the roadmap. Make it visible to leadership so they can hold you to it.
  • Hold a “what we’re not building” meeting this quarter. Dedicate one session specifically to reviewing the things you’ve decided not to build — and explaining why each one doesn’t fit the strategy. This meeting builds strategic clarity faster than any number of roadmap reviews.
  • Separate stakeholder input from strategic decisions. Build a process where stakeholder requests are collected and evaluated against strategy before they enter the roadmap — not after. The sequence matters. Request-first-strategy-second produces a wish list. Strategy-first-request-evaluation-second produces a roadmap.
  • Make your strategy falsifiable. For each major strategic bet, write down explicitly: “We would know this strategy is wrong if ______.” If you can’t complete the sentence, your strategy is too vague to execute. The falsification condition is what makes strategy useful for decision-making rather than decorative for presentations.

Strategy clarity is especially important in the first months of a leadership role — the first 100 days as a product leader is when the strategic framing gets established that will govern every roadmap decision that follows. And the deep customer knowledge discipline is what feeds the “explicit bet on user need” that every real strategy requires.

Leading a product team and finding that the roadmap is driving the strategy rather than the other way around? I consult with product organizations on strategy development, roadmap governance, and building the decision-making processes that connect daily execution to long-term competitive position. Let’s talk.

Why AI Decision-Making Needs Human Judgment: The Solomon Test for Product Leaders

The story of Solomon’s judgment is one of the oldest decision-making frameworks in recorded history. Two women claim the same child. No witnesses, no evidence, no algorithmic output that could resolve the dispute. Solomon’s solution — propose dividing the child, then watch who objects — is a masterclass in what optimization cannot do: use the decision itself as the test that reveals the truth.

I’ve been thinking about this story a lot while working on AI recommendation systems. We’ve gotten very good at optimization. We can recommend content, predict churn, personalize experiences, and rank options with impressive accuracy. What we haven’t gotten good at — and what I’m not sure AI will ever be good at — is recognizing when the optimization is solving the wrong problem entirely.

The Algorithm vs. the Test

A recommendation algorithm optimizes for a signal. Click-through rate, completion rate, return visits, explicit ratings — whatever you tell it to optimize, it will optimize. The problem is that the signals we can measure are often proxies for the outcomes we actually care about, and at some point the proxy diverges from the outcome.

I’ve seen this play out in digital content platforms repeatedly. A reading plan recommendation algorithm optimizes for completion rates. It gets good at predicting which plans users will finish. It starts recommending plans that feel familiar, comfortable, and achievable — because those are the ones that get completed. Completion rates go up. The metric is green. But users who choose plans that “mismatch” their stated preferences — the challenging, unfamiliar ones — show better long-term engagement. The algorithm optimized for completion and selected against growth.

Solomon’s test is the thing the algorithm can’t run. He couldn’t optimize his way to the truth. He had to create a condition that would reveal the truth through the parties’ responses. That kind of judgment — knowing that the right test will surface the right answer — is what AI cannot replicate, and what product leaders need to understand as an irreplaceable capability.

The Limits of AI Optimization

There’s a specific category of decisions where optimization fails systematically, and it maps precisely to the conditions of Solomon’s judgment: when the right answer requires understanding something about a party’s underlying motivation or stake in the outcome that isn’t visible in any behavioral signal.

In product terms: the algorithm can tell you what users do. It cannot tell you what users are trying to become. It can tell you what content users complete. It cannot tell you whether completion served their actual goal. It can tell you what users click. It cannot tell you whether the click reflects genuine engagement or habit, genuine interest or boredom-driven curiosity.

These are not problems of insufficient data. They’re problems of category — you cannot optimize your way to answers about meaning, motivation, and stake without reducing those human realities to signals that don’t actually capture them. This is what HBR’s research on AI decision limits has consistently found: AI excels at decisions that can be fully specified by their optimization criteria. It fails at decisions where the criteria themselves are contested or where the right answer depends on understanding what’s at stake for the parties involved.

Creating Space for Judgment

The practical implication for product teams isn’t “stop using optimization.” It’s “design your system so that optimization handles what optimization is good at, and humans handle what optimization can’t reach.”

This requires being explicit about the decisions your AI system is making and auditing them for the category of problem they represent. Can the decision be fully specified by a measurable outcome? Optimization is appropriate. Does the decision depend on understanding a user’s underlying motivation, growth trajectory, or stake in the outcome? Optimization needs a human checkpoint.

In practice, this means building escalation paths. Decisions that an algorithm can handle with confidence: automated. Decisions that carry high downstream consequence and depend on judgment the algorithm can’t access: escalated to a human. The challenge is designing the system so that the escalation happens before the consequence rather than after it.

The Wisdom Gap in AI Product Teams

Most product teams have become very good at evaluating AI performance — accuracy rates, precision, recall, business metric lift. What they haven’t systematically developed is the capacity to evaluate AI judgment — the ability to recognize when the system is optimizing confidently toward the wrong objective.

This is the wisdom gap. Wisdom, in the traditional sense, isn’t just knowledge or capability — it’s knowing when to apply which capability, and knowing when the right answer can’t come from a formula at all. Solomon wasn’t impressive because he had access to more information than anyone else. He was impressive because he understood that the right test would reveal what no available information could.

Building that capacity into a product team means creating explicit processes for questioning whether AI systems are optimizing for the right things — not just whether they’re optimizing well. It means reviewing decisions at the system level, not just the feature level. It means asking, regularly: what would the right answer look like if we couldn’t use a metric to find it? If the answer changes when you remove the metric, the metric is the wrong one.

The Practical Applications

Three specific places where Solomon’s framework changes how I evaluate AI product decisions:

Recommendation systems: Don’t just optimize for completion or engagement. Build qualitative research into your cadence that asks users whether the AI-recommended path served their actual goal. The behavioral signal and the goal-based signal often diverge. The divergence is where the product insight lives.

Personalization: The most effective personalization isn’t always the most comfortable personalization. Before concluding that an algorithm is working because engagement is up, ask whether the algorithm is serving users’ stated goals or optimizing toward the path of least resistance. Users often engage more with what’s familiar than with what’s growth-oriented. The engagement metric can’t distinguish between the two.

High-consequence decisions: Any AI-assisted decision with significant downstream consequences for a specific user — content restrictions, account actions, eligibility determinations — needs a human checkpoint. Not because the AI will always be wrong, but because the cost of confident wrongness in these cases is high enough that the judgment of a human who understands what’s at stake is worth the operational overhead.


Your Turn: Apply This Today

Build the judgment layer into your AI product process:

  • Audit your AI system’s decisions for decision category. List the top 10 decisions your AI system makes. For each one, ask: can this decision be fully specified by its optimization criteria? If yes, automation is appropriate. If the right answer depends on understanding user motivation or stake, build a human checkpoint.
  • Separate your “proxy metrics” from your “outcome metrics.” Identify which metrics your AI optimizes directly and which outcomes you actually care about. Map the relationship between them. Where the proxy and the outcome diverge, you have a place where optimization is working against you.
  • Build qualitative research into your AI evaluation cadence. Once per quarter, interview users whose behavior your AI system has most significantly influenced. Ask whether the AI-driven experience served their actual goal. The divergence between behavioral signals and goal-based answers is where your most important product insights live.
  • Design a “Solomon test” for your highest-stakes AI decisions. For the AI-assisted decisions with the highest downstream consequence, design a test that would reveal whether the AI’s recommendation was right — not just whether users accepted it. Acceptance and correctness are not the same.
  • Create an escalation path for judgment-dependent decisions. Identify the category of decisions that require understanding user motivation or stake. Build an explicit escalation path so that those decisions reach a human before the consequence rather than after. Make it part of your system design, not an emergency procedure.
  • Hold a “what if we removed the metric?” review annually. For each of your AI system’s optimization targets, ask: what would the right answer look like if we couldn’t use this metric? If the answer changes significantly, the metric needs to be reconsidered. This is the most uncomfortable product review you will have and the most valuable.

The judgment-vs-optimization tension shows up across AI product decisions — the Kahneman System 1 paradox addresses the cognitive load dimension of the same problem, and why product decisions are never just product decisions explores the ethical dimension of designing AI systems that optimize toward the right ends.

Building AI systems and trying to design judgment into the process rather than optimizing it away? I consult with product teams on AI product strategy, human-AI decision frameworks, and building systems that know when optimization is the right tool — and when it isn’t. Let’s talk.

Three Rules for AI Prompt Design: What John Wesley’s Constraint Framework Teaches Product Teams

John Wesley reduced the entire Methodist movement to three rules: do no harm, do good, and stay in love with God. Not a comprehensive theology. Not thirty-seven principles. Three simple constraints that anyone in a Methodist society could remember, apply, and teach.

I’ve been thinking about Wesley’s framework — specifically his instinct that constraint creates more impact than capability — as product teams struggle with AI implementation. Claude can write, code, analyze, reason, and create. GPT-4 handles everything from customer support to strategic planning. The temptation is to throw AI at every problem and see what sticks.

Wesley understood something we keep relearning: the most powerful systems aren’t the ones that can do everything. They’re the ones that do specific things exceptionally well within clear boundaries. Here’s what his three-rule framework looks like as an AI product design principle.

Do No Harm: The First Constraint in AI Prompt Design

In AI product work, “do no harm” translates to a specific discipline: define what the AI system should not do before you define what it should do.

Most teams design AI features by expanding capability — adding more things the AI can handle, more use cases it supports, more outputs it can generate. The Wesley approach inverts this: start by naming the exclusions. What should this AI never say? What user requests should it decline to fulfill? What outputs would be harmful, misleading, or counterproductive even if technically achievable?

The constraint changes the architecture of the feature. Teams that start with capability tend to build AI that does many things adequately. Teams that start with harm prevention tend to build AI that does specific things with genuine care for the user’s actual outcome. The difference in user trust is significant and compounds over time.

In practice: before your next AI feature ships, write down three things it should never do. Make those constraints explicit in your system prompt, your evaluation criteria, and your launch checklist. The discipline of naming the harms before the capabilities will change what you ship.

Do Good: The Second Constraint

“Do good” seems obvious — of course the goal is to do good. But Wesley’s point wasn’t that doing good is obvious. His point was that you have to choose it actively and specifically. You have to name the good you’re trying to do, not assume it follows from capability.

For AI product teams, this translates to specificity of purpose. The most dangerous AI features are the ones built with vague positive intent — “help users be more productive,” “improve the experience,” “make things easier.” These goals aren’t wrong; they’re insufficiently specific. They don’t constrain what the AI does well enough to actually do good reliably.

The teams building the most impactful AI features are the ones who can answer precisely: for which specific user, in which specific situation, doing which specific task, does this AI feature create genuine value? The narrower the answer, the more intelligence you can focus on serving that specific case. AI amplifies specificity. A vague AI feature serves everyone adequately. A specific AI feature genuinely transforms a particular user’s experience.

The prompt design corollary: every AI system prompt should contain an explicit statement of the specific good the AI is designed to do. Not “help users.” Something precise enough that you could evaluate whether a given output advances it or not.

Stay in Love With the Mission: The Third Constraint

Wesley’s third rule is the hardest to translate into product terms, but the translation is important. “Stay in love with God” was Wesley’s constraint against letting the institution — the methodology, the system, the practice — become the end rather than the means. The movement existed to serve spiritual formation, not itself. When the structure started serving itself, Wesley’s third rule was the corrective.

For AI product work, this is the constraint against losing the mission in the mechanics. It’s easy to become so focused on what the AI can do — what’s technically impressive, what gets stakeholder attention, what generates engagement metrics — that you lose track of whether the AI is actually serving the user’s underlying goal.

I’ve watched this happen. A team builds an AI writing assistant that users engage with heavily — and then realize the engagement is driven by the novelty of watching the AI write, not by the quality of what users are actually producing. The metric is green. The mission has drifted. The AI became the point rather than the tool.

“Stay in love with the mission” is a standing question for every AI feature review: is this feature advancing the user’s actual goal, or is it becoming the goal itself? The discipline of asking it consistently is what keeps AI features from becoming impressive demonstrations that quietly fail the user. More on this from a product ethics perspective: the Interaction Design Foundation’s framework on design patterns is a useful lens for evaluating where AI features cross from tool to substitution.

Why Constraint Beats Capability in AI Product Design

The Methodist movement succeeded not because Wesley had the most sophisticated theology, but because he had the most practical framework. Three rules anyone could remember, apply, and teach — that’s what scaled the movement across 18th-century Britain.

The teams building the most impactful AI products right now aren’t the ones deploying every capability. They’re the ones who’ve chosen clear constraints that channel AI capability toward specific, valuable outcomes. Their prompts are shorter and more focused. Their AI features do fewer things with more precision. Their users trust the outputs because the product was clearly designed with the user’s outcome in mind rather than the AI’s capability ceiling.

In a world where AI can do almost anything, the crucial question isn’t what’s possible. It’s what’s wise. Wesley figured out how to answer that question with three rules. Your AI prompts need the same discipline.


Your Turn: Apply This Today

Apply the three-constraint framework to your current AI features:

  • Write the “do no harm” list for your highest-traffic AI feature. Name three things this AI should never output, regardless of what the user asks for. Write them as explicit constraints in your system prompt. If you’ve never done this exercise, do it before your next feature review — it will surface assumptions nobody has articulated.
  • Rewrite your AI feature’s purpose statement as a specific user outcome. Replace “help users be more productive” with a sentence specific enough to evaluate: “Help [specific user type] accomplish [specific task] in [specific context] faster and with better results.” If you can’t write the specific version, the feature isn’t ready to ship.
  • Add a “mission drift” check to your sprint review. Ask once per sprint: are users engaging with this AI feature because it’s helping them accomplish their goal, or because it’s impressive to interact with? The engagement metric doesn’t distinguish between these. Your qualitative research should.
  • Test the three-rule prompt structure on your next system prompt. Write a system prompt that has three sections: what the AI must never do, what specific good the AI is designed to do, and what the user’s underlying goal is that the AI should always serve. Compare the outputs to your current prompt. The discipline almost always improves them.
  • Audit your AI feature’s constraint density. Count the constraints in your current system prompt vs. the capabilities you’ve described. If capabilities dramatically outnumber constraints, you’ve built an AI feature by expansion rather than by design. Add at least one constraint for every three capabilities you’ve described.
  • Apply the “Wesley test” before your next AI feature launch. Ask: does this feature do no harm? Does it advance a specific, named good for a specific user? Does it serve the user’s mission rather than substituting for it? If you can’t answer yes to all three, the feature needs more design work before it ships.

The constraint-beats-capability principle shows up in other forms too — the choice overload paradox in AI features is the consumer-facing version of the same dynamic, and Munger’s inversion principle is a complementary framework for designing around constraints rather than toward capabilities.

Building AI features and trying to avoid the “impressive but useless” trap? I consult with product teams on AI prompt architecture, constraint-based feature design, and building AI that serves users’ actual goals rather than demonstrating capability. Let’s talk.

Designing for Obsolescence: What Seneca’s Philosophy Reveals About AI System Architecture

Seneca wrote that every new beginning comes from some other beginning’s end. He was writing about time — how we live in a continuous flow of transitions, each new phase emerging from the completion of the last. Product leaders building AI systems are working through exactly this dynamic, whether they’ve named it or not.

Every AI model update creates a new beginning by ending the previous version’s assumptions. Every infrastructure upgrade rewrites the operational playbook. Every significant capability improvement from a frontier provider makes yesterday’s integration architecture obsolete. The question for product leaders isn’t whether your AI systems will become obsolete — it’s whether you’ve designed for that obsolescence or against it.

The Paradox of AI Infrastructure

Here’s the tension: we want the reliability of traditional software — predictable, testable, maintainable — but we’re building with components that evolve continuously through training updates, capability improvements, and API changes. The mental model of “build it, test it, ship it, maintain it” breaks down when the underlying model your product depends on updates monthly and the new version behaves differently than the version you tested.

Traditional software has a stable lifespan. A well-built accounting system can run for a decade with minimal changes. AI systems don’t work that way. The capability floor keeps rising. A system that uses GPT-3.5 for a task that GPT-4 now handles better isn’t just suboptimal — it’s a competitive disadvantage, and the gap between the current capability and your deployed system grows every month you maintain the old architecture.

This creates a design challenge that has no equivalent in traditional software: you need to build systems that are robust enough to operate reliably at scale AND modular enough to be replaced component by component as better capabilities become available. These two requirements pull in opposite directions, and resolving the tension is one of the defining architectural challenges of AI product development right now.

Designing for Obsolescence

The practical answer is designing for obsolescence rather than designing for permanence. This means:

Modular AI architecture. Build so that individual AI components can be upgraded or replaced without rewriting the product around them. The abstraction layer between your product logic and your AI provider isn’t just a technical nicety — it’s what makes planned obsolescence economically viable. Without it, every major model upgrade is a replatforming project.

Shorter depreciation cycles. AI infrastructure doesn’t depreciate on the same timeline as traditional software infrastructure. Planning for 6-12 month cycles on AI-specific components — rather than the 3-5 year cycles that traditional software justifies — is a more honest accounting of how quickly the capability landscape moves. Finance teams that don’t understand this will consistently underfund the migration work required to stay competitive.

Evaluation frameworks that survive component changes. If your AI system doesn’t have an automated evaluation suite that runs against new model versions before deployment, you have no safe migration path. Every model update becomes a manual testing project, and the cost of that manual work is what prevents organizations from adopting capability improvements on a competitive timeline. Build the evals first. They outlast every model generation.

The Economics of Planned Obsolescence

There’s a counterintuitive economic argument here that takes time to internalize: the goal isn’t to build AI systems that last forever. It’s to build AI systems that create enough value before their obsolescence to fund their own replacement.

This is different from how most organizations think about infrastructure investment. The traditional framing is: invest once, amortize over years, minimize ongoing costs. The AI infrastructure framing is: invest in modularity, accept component replacement as a regular operational cost, and measure success by how quickly you can upgrade — not by how long you can avoid upgrading.

Organizations that accept this economic reality build faster. They ship AI improvements on the timeline of capability improvements rather than on the timeline of their budget cycle. The ones that treat AI infrastructure like traditional software infrastructure spend their engineering budget on maintaining increasingly outdated components rather than on adopting the capability improvements that would actually serve users better. Andreessen Horowitz’s AI Canon has useful framing on the infrastructure investment thesis that informs this economic argument.

The Human Side of Technological Obsolescence

There’s a team-level version of this design-for-obsolescence principle that’s equally important: the skills required to build AI products are themselves subject to rapid obsolescence.

Prompt engineering techniques that were genuinely valuable 18 months ago have been partially automated. Fine-tuning approaches that required deep ML expertise are now accessible with far less specialization. The skill set that makes a great AI product team today is different from the skill set that will matter in two years — not completely different, but different enough that teams that don’t continuously build learning into their culture will find themselves behind.

When I’m building an AI product team, I prioritize adaptability and learning velocity alongside current expertise. Current tool knowledge matters. The capacity to acquire new tool knowledge matters more over a multi-year horizon. This is the team-level version of Seneca’s principle: build organizations that benefit from change rather than organizations that resist it.


Your Turn: Apply This Today

Design for obsolescence before the next obsolescence cycle forces you to:

  • Audit your AI architecture for replaceability. For each AI component in your product, ask: if we needed to swap this model or provider in 60 days, what would it take? If the answer is “months of reengineering,” you have a design-for-permanence problem that will cost you competitively every time the capability landscape shifts.
  • Build your evaluation framework before you need to migrate. Create an automated suite that tests your AI system’s behavior against defined quality criteria. Run it against new model versions before deployment. This is the single most important investment for making planned obsolescence operationally viable.
  • Restructure your infrastructure budget cycle for AI components. Present a 12-month replacement plan for AI-specific components to your finance stakeholders — not as a failure to build durable systems, but as a realistic amortization of infrastructure that improves faster than traditional software. Make the economic argument explicitly.
  • Map your team’s “obsolescence risk” skills. Identify the skills in your team that are most likely to be automated or made obsolete in the next 18 months. Build a plan for transitioning those team members to higher-leverage work before the transition is forced. This is better for your team and better for your product.
  • Build abstraction layers into every new AI integration. Make it a team norm: no direct AI provider dependency in product logic. Every AI integration goes through an abstraction layer that can be retargeted without rewriting the product. This is the architectural equivalent of designing for obsolescence.
  • Run a “what if this model becomes unavailable?” exercise quarterly. For your primary AI capabilities, simulate a scenario where the current provider or model is unavailable in 90 days. How long does migration take? What breaks? The exercise will reveal architectural dependencies you’ve accepted without explicitly deciding to.

Seneca’s discipline about time connects directly to the hidden cost of real-time AI decisions — Seneca’s email rule and the cost of always-on AI explores the other dimension of this Stoic lens on AI product decisions.

Building AI systems and trying to make architecture decisions that hold up as capabilities evolve? I consult with product teams on AI architecture strategy, planned obsolescence frameworks, and building organizations that adapt rather than resist when the capability landscape shifts. Let’s talk.

Why Teresa Torres’ Continuous Discovery Cadence Is Breaking Down in the AI Age

Teresa Torres built continuous discovery as an antidote to product teams that build in isolation — teams that spend months building solutions to problems they think exist, then ship to users who wanted something entirely different. The framework is genuinely valuable: weekly customer interviews, systematic opportunity mapping, iterative testing. It solved a real problem.

Here’s the question I’ve been wrestling with: in an environment where AI can monitor customer behavior, surface opportunity signals, and flag behavioral anomalies continuously — not weekly, not daily, continuously — is the weekly interview cadence still the right rhythm for discovery? Or has it become a bottleneck?

The Continuous Discovery Promise

Torres’s core insight is sound and worth protecting: most product failures happen because teams build solutions for problems they imagined rather than problems that actually exist. Her framework forces the discipline of staying connected to users rather than getting lost in internal roadmap debates.

The weekly interview cadence is the mechanism that makes the framework work in human-speed product environments. A team that talks to users weekly, maps what they hear, and designs experiments based on that mapping has a significant advantage over a team that does discovery quarterly or on an ad hoc basis. That’s true and it’s worth affirming before getting into the breakdown.

Where Continuous Discovery Breaks Down

The breakdown isn’t in the principles — it’s in the gap between the discovery cadence and the signal velocity that AI-augmented product environments now generate.

When you’re monitoring support channels, usage patterns, and behavioral data continuously with AI, you’re not waiting for a weekly interview to surface a new opportunity cluster. The signal is arriving every hour. The opportunity identification is happening faster than the discovery cadence can process it. The bottleneck shifts from “we don’t have enough signals” to “we have more signals than our discovery process can evaluate.”

There’s also a deeper problem: some of the most important signals about user needs don’t come from what users say in interviews — they come from what users do that doesn’t match what they said. AI is better at surfacing the behavioral signal than interviews are. The weekly interview remains valuable, but it’s no longer the primary source of opportunity discovery in a well-instrumented product environment. Torres’s framework was designed for the interview as the primary signal. The role of the interview has changed.

The Human-AI Discovery Gap

The specific gap I’ve encountered: AI surfaces opportunity patterns from behavioral data faster than human-centered discovery processes can validate and act on them. This creates a queue of AI-identified opportunities waiting for human investigation that grows faster than the weekly interview cadence can clear it.

The result is that teams using AI-augmented discovery alongside Torres’s cadence end up in one of two failure modes: they ignore the AI-surfaced signals and run a traditional discovery process that is now missing a large class of opportunities, or they try to validate every AI-surfaced signal through the traditional interview-and-experiment cadence and create a backlog so long that validated insights are stale before they reach solution design.

The framework needs an explicit mechanism for triaging AI-surfaced signals before they enter the discovery process — something Torres’s original framework doesn’t provide because the volume of signals it was designed to handle was fundamentally human-generated.

What Discovery Looks Like With AI-Augmented Signals

The adaptation I’ve found most useful: restructure the discovery cadence so that AI handles continuous signal monitoring and human-centered discovery handles signal validation and solution design.

AI monitors behavioral signals continuously and surfaces clusters of anomalies or emerging patterns for human review. This replaces the “discovery” part of the weekly interview — the team is no longer relying on interviews to surface new opportunities, because the AI is doing that continuously. The weekly interview becomes a validation mechanism: we talk to users about opportunities the AI has already identified, rather than hoping the interview will surface something new.

This shift changes what the interview accomplishes. Instead of “tell me about your experience” (open-ended discovery), the interview becomes “the data shows users are abandoning this workflow at step 3 — can you walk me through what that experience is like for you?” (targeted validation). The insight quality goes up. The discovery efficiency goes up. The interview becomes more valuable because it’s now pointed at a specific signal rather than fishing for signals generally.

The Missing Piece in Torres’ Framework

The missing piece isn’t a critique of Torres — it’s a gap that didn’t exist when the framework was designed. The missing piece is an explicit decision rule for how AI-surfaced signals enter the discovery process and what it takes for them to earn human investigation time.

Without that decision rule, every AI-surfaced signal competes for the same human attention budget as the signals coming from interviews. The volume overwhelms the process. The team either starts ignoring signals or starts investigating everything and producing nothing.

The decision rule I use: AI signals get human investigation time when they meet a significance threshold (behavioral change above a defined magnitude), a duration threshold (the pattern persists for at least two weeks), and a strategic relevance filter (the affected workflow or user segment is in current strategic focus). Signals that don’t meet the threshold stay in monitoring. Signals that meet it get a targeted interview and a fast experiment to validate. Teresa Torres’s original continuous discovery documentation is the right foundation — the adaptation is building the signal triage layer on top of it.


Your Turn: Apply This Today

You don’t need a 20-agent AI system to apply this. Start with the principles:

  • Audit how your current discovery signals are sourced. List the top five sources of opportunity signals your team acts on. What percentage come from interviews? What percentage from behavioral data? If the behavioral data is underrepresented, you’re missing a signal class that doesn’t require AI to access — just better instrumentation and review habits.
  • Shift at least one weekly interview to validation mode. Before your next customer interview, identify one behavioral anomaly from your product data — a workflow with high abandonment, a feature with unexpectedly low engagement, a search query with no good results. Make that the focus of the interview. Validate the signal, don’t fish for new ones.
  • Build a signal triage decision rule. Define explicitly what it takes for a behavioral signal to earn human investigation time: significance threshold, duration threshold, strategic relevance filter. Write it down and apply it to your next backlog grooming session. Everything that doesn’t meet the bar stays in monitoring.
  • Create a “discovery queue” separate from your opportunity backlog. AI-surfaced signals that haven’t been validated by a human should sit in a discovery queue, not the opportunity backlog. They don’t get resourced until they’ve been validated through a targeted interview or behavioral experiment. This prevents the queue from contaminating your actual opportunity prioritization.
  • Measure your discovery process’s throughput, not just its output. Track how many signals enter your discovery process per month and how many get resolved (validated or dismissed). If the queue is growing faster than it’s being cleared, the discovery cadence needs to be restructured — not accelerated, restructured.
  • Protect the interview for the human insight it uniquely provides. As you shift interviews from discovery to validation, stay alert for the kinds of insight only an interview can generate — the emotional context, the unstated assumption, the use case nobody modeled. These won’t show up in behavioral data. The interview is still the best tool for them. Just don’t use it for signal discovery when the data is already doing that job.

The OST framework has related adaptations at scale worth reading alongside this — how OSTs need to evolve for AI-augmented discovery and the execution complexity branch that’s missing from most opportunity trees.

Running product discovery and trying to figure out how AI changes the process? I consult with product teams on discovery operations, AI-augmented research processes, and building the systems that keep human judgment at the center of AI-accelerated product development. Let’s talk.

Opportunity Solution Trees Meet AI Reality: How the Framework Breaks Down at Scale

Teresa Torres’s opportunity solution tree framework is one of the better discovery tools I’ve used. The structure — mapping customer opportunities to potential solutions in a visual tree — helps teams stay connected to user needs rather than defaulting to feature lists and stakeholder requests. I use a version of it. But running product for a platform at significant scale, and building AI agents to support the discovery process, has taught me where the framework needs substantial adaptation before it works in the real world.

This isn’t a critique of Torres. Her framework is sound. It’s an observation that opportunity solution trees as designed assume a pace and a data volume that most scaled products don’t operate at — and that AI changes both the opportunity and the failure mode of the entire discovery process.

The Opportunity Tree Reality Check

The OST framework looks clean in workshop settings. Customer interviews surface opportunities. Teams map solutions. Experiments get scoped. Everyone leaves feeling aligned. At scale, the process breaks down in a specific way: the rate of incoming opportunity signals overwhelms the team’s capacity to process them through a manual discovery rhythm.

When you’re monitoring support channels, usage patterns, feedback streams, and behavioral data for a large user base, the volume of potential opportunity signals is enormous. The question isn’t “how do we find the opportunities?” — AI can surface opportunity clusters from that data continuously. The question is “how do we evaluate and prioritize opportunities that are emerging faster than we can interview users about them?” Torres’s framework was designed for a world where discovery is the bottleneck. At scale with AI, discovery is no longer the bottleneck — judgment about what to do with what you’re discovering is.

Where AI Changes Everything in the OST Process

The adaptations I’ve made to the OST process at scale are all about restructuring what AI handles vs. what humans handle:

AI handles continuous opportunity sensing. Rather than quarterly or monthly discovery cycles, I’ve shifted to AI agents that monitor support channels, usage patterns, and feedback streams continuously — flagging opportunity clusters for human investigation. The tree is no longer built in workshops; it’s maintained continuously with AI surfacing new branches as they emerge from data.

Humans handle opportunity prioritization and solution imagination. Torres often combines opportunity identification and solution generation in workshops. At scale, separating these produces better results. AI excels at recognizing opportunity patterns from data. Humans excel at imagining solutions that connect to those opportunities in ways that AI wouldn’t generate without significant prompting. Keeping these steps separate — and keeping humans firmly in charge of the prioritization step — prevents the discovery process from becoming AI-driven by default.

The tree becomes a living document, not a workshop artifact. The biggest practical change: the OST is updated continuously based on incoming data rather than rebuilt quarterly. AI maintains the structural connections. Humans make the strategic decisions about which branches to pursue, which to prune, and which to flag for deeper investigation.

The Missing Piece: Solution Validation Speed

Torres’s framework handles opportunity identification well and provides a solid structure for mapping solutions. Where it provides less clarity — and where I’ve had to build my own process — is solution validation speed at scale.

When you can run experiments across a large user base, the question isn’t “how do we test this?” — it’s “how do we test this fast enough to keep up with the opportunity identification rate?” If AI is surfacing new opportunity clusters weekly but your experiment design and execution cycle takes six weeks, you’re building a backlog of validated opportunities that never get addressed before the next cycle of discovery makes them stale.

The solution I’ve found: tiered experiment design. Some solutions get full experiment design and execution. Others get fast behavioral proxies — early signals from a small user segment that don’t require full experiment infrastructure. The OST needs an explicit tier for each solution branch that determines the validation approach before the experiment is scoped. Without it, every solution competes for the same limited experiment capacity and most of them wait too long.

The Cultural Complexity Challenge

For products serving users across diverse geographies and cultural contexts, the OST needs one more adaptation Torres’s original framework doesn’t explicitly address: opportunity branches that are culturally segmented rather than universal.

The same user need can manifest in fundamentally different ways across cultural contexts — different mental models, different workflows, different relationship to the product’s value proposition. AI helps identify when an opportunity pattern is consistent across user segments and when it’s culturally specific. Without that distinction, you end up building solutions to universal-seeming opportunities that actually only solve for the segment that happens to be most vocal in your feedback channels.

The practical implication: before any significant OST branch gets resourced for solution design, ask explicitly whether the opportunity is universal or segment-specific. If it’s segment-specific, design the solution for that segment first — don’t try to generalize prematurely. Teresa Torres’s original OST documentation is the right starting point for understanding the framework’s foundations before adapting it.

What Still Requires Human Judgment

AI augments the OST process at scale in meaningful ways. It does not replace the human elements that make the framework valuable. Opportunity prioritization is still a human decision — AI can surface opportunity clusters, but the judgment about which opportunity is worth pursuing given current strategic context requires the kind of synthesized understanding that AI surfaces from data but doesn’t hold.

Solution creativity remains human. The connection between a user opportunity and an imaginative solution that users didn’t know they wanted — that’s not something AI generates reliably without creative prompting. And strategic trade-offs — the decisions about which opportunities to deprioritize in service of organizational focus — require human judgment that accounts for context AI doesn’t have.

The adapted OST is better than the original for scaled products. But “better” here means AI is doing more of the pattern recognition so humans can do more of the judgment work. The goal isn’t to automate discovery — it’s to remove the bottlenecks that keep humans from doing the discovery work that actually requires human judgment.


Your Turn: Apply This Today

Whether you’re just starting with OSTs or looking to evolve your current discovery process:

  • Separate opportunity identification from solution generation in your process. If you’re currently doing both in workshops, split them into distinct phases. Use your data infrastructure (or AI tools) for continuous opportunity surfacing. Reserve workshop time exclusively for solution design and prioritization — where human creativity and judgment add the most value.
  • Build a tiered validation system for your solution branches. Before any solution moves to experiment design, assign it a validation tier: full experiment, behavioral proxy, or customer interview. The tier determines the validation approach and timeline. This prevents every solution from competing for the same experiment infrastructure.
  • Make your OST a living document, not a workshop artifact. If you’re rebuilding your OST quarterly, you’re losing continuity between cycles. Move it to a shared, continuously-updated document. Designate someone as the OST owner responsible for keeping the tree current between discovery cycles.
  • Segment your opportunity identification by user type before prioritizing. Before deciding which opportunity to pursue, ask: is this opportunity universal across my user base, or specific to a segment? If it’s segment-specific, validate it with that segment first. Don’t let your most vocal users define the universal opportunity.
  • Instrument one AI-assisted opportunity monitoring signal this quarter. Set up one automated signal that surfaces opportunity patterns from your existing data — support ticket clustering, feature usage drop-offs, search queries with no good results. Feed it into your OST review meeting as a standing input. This is the minimum viable version of continuous opportunity sensing.
  • Explicitly protect human prioritization of the OST. If AI is surfacing opportunity data, establish a clear norm: AI surfaces, humans decide. Don’t let the volume and frequency of AI-surfaced opportunities shift the prioritization decision toward “whatever AI flagged most recently.” The human judgment layer is the framework’s value. Protect it deliberately.

The execution complexity challenge in OST is closely related — I’ve written about the missing branch in opportunity solution trees that most teams overlook when mapping solutions. And the continuous discovery breakdown that AI accelerates is a distinct problem worth examining separately.

Running discovery at scale and trying to adapt your product process to the pace AI enables? I consult with product teams on discovery operations, experiment design at scale, and building the systems that keep human judgment at the center of AI-augmented product development. Let’s talk.

The Product Leader’s AI Infrastructure Blind Spot: What Jensen Huang’s Sovereign AI Argument Actually Reveals

Jensen Huang has been making a specific argument at conferences for two years now: AI infrastructure isn’t just a business advantage — it’s national security. Countries that don’t build sovereign AI capabilities will find themselves dependent on foreign nations for the intelligence layer that powers their economies. The geopolitical version of this argument is provocative. The product version of it is something most product leaders haven’t fully processed.

Here’s the product version: most AI product decisions are constrained by infrastructure choices that product leaders didn’t make and don’t fully understand. That constraint is getting more expensive as AI becomes more central to every product’s value proposition.

The Infrastructure Stack Nobody Talks About in PM Reviews

Most product leaders focus on the application layer — which model to use, how to prompt, whether to build or buy an AI feature. Huang is talking about something beneath that: the compute layer, the data center geography, the inference infrastructure, the energy capacity required to run large models. That lower layer determines what’s possible at the application layer, and most PMs have never mapped their dependency on it.

The practical version for a product organization: your AI capability roadmap is constrained by what your infrastructure stack can support, what your providers will make available to you, and what regulators in your operating markets will permit. When those constraints change — and they will change — your product decisions change with them, whether you’ve planned for it or not.

Serving users across multiple geographies makes this concrete quickly. Latency requirements for AI features vary by region. Data residency regulations vary by country. Provider availability varies by market. A product decision that works in one deployment context may be impossible in another. The infrastructure layer isn’t just an engineering concern — it’s a product strategy constraint.

The Hidden Costs of Inference Dependency

When you integrate a frontier AI model via API, the visible cost is the per-token or per-call pricing. The hidden costs are the ones that show up when things change: migration cost when you need to switch providers, renegotiation leverage you don’t have because you’ve built deep dependencies, the engineering cost of adapting when the API changes, and the product cost of capabilities being deprecated or repriced.

I’ve been working through three specific changes to how I think about AI infrastructure because of this:

Inference sovereignty audit. Mapping every AI dependency and its single-point-of-failure risk. Not just “which provider are we using” but “what happens if this entire category of capability becomes unavailable, significantly more expensive, or geographically restricted?” Most teams have never run this audit. The combined risk profile of a typical AI product is more concentrated than it appears from any individual dependency.

Hybrid deployment strategy. Running smaller, purpose-specific models for latency-critical features rather than defaulting to large frontier models for everything. Not full local deployment — the cost is prohibitive for most workloads — but strategic placement of specialized models where frontier models are overkill and edge deployment reduces latency and dependency simultaneously.

Data gravity awareness. Understanding that AI capabilities concentrate around where the data is. Organizations with unique, high-quality proprietary data have infrastructure leverage that commodity compute cannot replicate. The infrastructure investment that matters most for most product organizations isn’t compute — it’s the data infrastructure that makes proprietary training and fine-tuning possible.

The Product Leader’s Infrastructure Blind Spot

The specific blind spot Huang’s argument reveals for product leaders is this: we tend to evaluate AI infrastructure decisions the way we evaluate software vendor decisions — on current price and current capability. Infrastructure decisions have a different time horizon. The question isn’t “what does this cost today and does it work today?” It’s “what are the migration costs if this relationship changes, and what product decisions am I foreclosing by accepting this dependency?”

Product decisions made inside infrastructure constraints you don’t understand tend to look like missed opportunities in retrospect. You couldn’t build the feature because the provider didn’t support it. You couldn’t expand to a new market because the data residency requirements were incompatible with your architecture. You couldn’t negotiate on price because the switching cost was too high.

These aren’t failures of product strategy. They’re consequences of infrastructure decisions made without full visibility into the constraints they created. NVIDIA’s sovereign AI documentation lays out the infrastructure investment thesis at the national level — the product-level translation is understanding which version of these constraints applies to your organization and your product decisions.

Beyond the Hype Cycle: What Actually Changes

The sovereign AI framing matters for product leaders not because most of us are going to build national AI infrastructure, but because it makes the infrastructure constraint explicit in a way that the “just use the API” default doesn’t.

The products that survive infrastructure disruptions — pricing changes, geopolitical shifts, regulatory changes, provider strategy pivots — will be the ones built by teams that understood their infrastructure dependencies before they became constraints. Replace “national” with “product” in Huang’s argument and it holds: product sovereignty means having enough infrastructure control to make independent product decisions when the landscape changes.


Your Turn: Apply This Today

You don’t need to build your own data center to act on this. Start with visibility:

  • Run an AI dependency audit this quarter. List every AI provider, model API, and infrastructure service your product depends on. For each one, answer: what percentage of core user value depends on this? What’s the migration path if it becomes unavailable or 3x more expensive? The audit will surface concentrations of risk you haven’t explicitly accepted.
  • Map your data gravity. Identify the proprietary data assets your organization holds that an external provider cannot access. That data is your infrastructure leverage. Build the strategy for using it — fine-tuning, retrieval augmentation, evaluation datasets — before a competitor with similar data beats you to it.
  • Add infrastructure questions to your product strategy review. Before any major AI feature decision, ask: what infrastructure constraints does this create or deepen? What does the migration path look like if the provider relationship changes? These questions belong in the product review, not just the engineering review.
  • Build abstraction layers into your AI integrations. Design your AI integrations so that the provider is swappable without rewriting product logic. This is a small upfront investment that dramatically improves your negotiating position and reduces future migration cost.
  • Assess data residency requirements in your target markets. If you’re expanding geographically, understand the AI data requirements in each target market before committing to an architecture. Retrofitting data residency compliance into an AI product is significantly more expensive than designing for it in advance.
  • Run a “what if the frontier changes?” scenario annually. Ask: if the leading AI capabilities shift significantly in the next 18 months — new dominant model, new pricing structure, new regulatory environment — what product decisions would we wish we’d made differently? Use the answers to adjust your current infrastructure strategy.

The infrastructure decision connects to every other AI product decision — including the build vs. buy framework for AI infrastructure and how the sovereign AI argument translates to day-to-day product leadership.

Building AI products and trying to make infrastructure decisions that hold up as the landscape shifts? I consult with product teams on AI product strategy, infrastructure dependency frameworks, and building products that remain competitive when external constraints change. Let’s talk.

Jensen Huang’s Sovereign AI: What the Infrastructure Argument Actually Means for Product Builders

Jensen Huang’s sovereign AI argument is compelling at the nation-state level. Every country should control its AI destiny rather than depending on foreign infrastructure. The strategic logic is sound. The implementation story is considerably messier — and the mess is where the real product leadership lessons live.

I’ve watched organizations at multiple scales wrestle with the build vs. buy decision in AI infrastructure. The companies that get it right are not the ones that defaulted to either extreme. They’re the ones that developed a clear-eyed framework for evaluating the real trade-offs — and they usually had to learn that framework the hard way.

The Infrastructure Reality Check

Huang’s sovereign AI concept — building your own AI infrastructure rather than depending on external providers — carries a real cost that keynote slides consistently understate. When an organization decides to build infrastructure it could otherwise purchase, it’s making a decision that involves: significant upfront capital for compute, storage, and networking; ML engineering talent that is among the most expensive in the market; operational overhead for running, monitoring, and maintaining systems at scale; and the ongoing cost of keeping pace with a capability curve that is advancing faster than most internal teams can match.

None of this makes the build decision wrong. It means the build decision needs to be evaluated honestly against its actual cost — not against a reference point of what the infrastructure will be worth once it’s working perfectly. Most organizations evaluate the upside of sovereignty and underweight the downside of the build timeline.

Where Sovereignty Actually Matters

Not all AI capabilities are equal candidates for sovereignty decisions. The ones where building your own infrastructure makes strategic sense share a few characteristics: the capability is core to your product’s differentiation (commoditizing it would eliminate your moat), you have proprietary data that a vendor-provided model cannot access, your use case requires latency, privacy, or compliance constraints that external APIs cannot meet, or the capability is stable enough that you can amortize the build cost over multiple years.

The capabilities that are bad candidates for sovereignty decisions are the ones that are advancing rapidly at the frontier, where your internal team cannot keep pace with external model improvement, or where the task is generic enough that a frontier model already outperforms what you could build.

The practical sovereignty framework for most product organizations is not Huang’s comprehensive infrastructure ownership — it’s selective sovereignty. Own the capabilities where your proprietary advantage lives. Buy the capabilities that commodity providers can run better and cheaper than you can.

The Build vs. Buy Reality for Product Leaders

The build vs. buy decision in AI infrastructure is different from the traditional build vs. buy decision in software for one important reason: the capability being purchased is improving continuously. When you buy a license for accounting software, the software doesn’t improve significantly month over month. When you integrate a frontier AI model via API, you get the benefit of every model improvement the provider ships — without rebuilding anything.

This changes the economics substantially. A team that builds its own search ranking model might spend six months building something that outperforms GPT-4 on their specific use case. In twelve months, the frontier model has caught up and surpassed them — and they now own the maintenance burden of a system they can no longer keep competitive without sustained investment.

The sovereignty trade-off can be managed architecturally without owning the infrastructure: building abstraction layers that let you swap providers without rewriting product logic, maintaining evaluation frameworks that monitor output quality across providers, designing fallback logic for service failures, and avoiding deep API-specific integration that creates lock-in without corresponding value. This isn’t sovereignty in Huang’s comprehensive sense, but it’s practical sovereignty — controlling outcomes without controlling every component. For a deeper look at NVIDIA’s sovereign AI framework, the primary argument is worth reading directly; the implementation gap is significant.

Architectural Implications: What I’m Doing Differently

The sovereign AI argument changed how I think about AI architecture decisions — not by convincing me to build everything in-house, but by making the dependency questions explicit in every infrastructure conversation.

Before any significant AI integration decision, I now want to know: if this provider doubles their price or degrades their service quality in 18 months, what does our migration path look like? What would it cost, and how long would it take? That question changes which integration architecture you choose today. Teams that can answer it confidently have built practical sovereignty. Teams that can’t have accepted dependency without negotiating the terms of that dependency.

The goal isn’t to avoid all external dependencies. The goal is to understand your dependencies clearly and make intentional choices about which ones are acceptable risks and which ones are existential risks to your product. That discipline — applied to AI infrastructure the same way good engineering teams apply it to any critical external dependency — is the real lesson in Huang’s argument for most product leaders.


Your Turn: Apply This Today

Run this framework against your current AI infrastructure decisions:

  • Map your AI dependency profile. List every external AI provider or API your product depends on. For each one, estimate: what percentage of your core user value depends on this dependency? What happens if it goes away or becomes 3x more expensive? The map will surface risks you’ve been treating as invisible.
  • Apply the “sovereignty criteria” to each AI capability. For each capability on your list, score it against four criteria: is it core to your differentiation, do you have proprietary data for it, does it require compliance constraints external providers can’t meet, and is it stable enough to amortize a build? Build where you score 3+. Buy where you don’t.
  • Build an abstraction layer before you build anything else. If you’re integrating any AI provider today, start with an abstraction layer that lets you swap providers. It’s a small upfront investment that dramatically reduces your future migration cost and negotiating leverage with current providers.
  • Run a “provider departure” scenario annually. Once a year, ask: if our primary AI provider announced end-of-life in 6 months, what would we do? How long would migration take? What would it cost? The answer tells you your actual dependency risk — not the theoretical dependency you think you have.
  • Price your build decisions against the opportunity cost. Before committing to building AI infrastructure, calculate the engineering hours required. Then ask: what could those same engineers build that would advance user value directly? The opportunity cost is real and often underweighted in infrastructure conversations.
  • Establish a “make vs. buy” review as a standing agenda item for major AI decisions. Don’t let it default to either direction. Build the habit of explicit evaluation — cost, timeline, capability trajectory, and dependency risk — before any significant AI infrastructure commitment.

This infrastructure dependency thinking connects directly to the broader strategic framing — how the sovereign AI argument translates to product leadership decisions and why domain expertise becomes more valuable, not less, when AI handles the generic capabilities.

Navigating AI infrastructure decisions and trying to build something defensible rather than just functional? I consult with product teams on AI architecture strategy, build vs. buy frameworks, and building products that remain competitive as the AI capability landscape shifts. Let’s talk.

Munger’s Latticework and the Hidden Architecture of AI Product Systems

Most AI product failures I’ve seen aren’t caused by bad algorithms. They’re caused by good algorithms that no one thought about as a system.

A team ships a content recommendation model. It performs beautifully in testing. Six months later, they ship a search ranking model. Also strong in isolation. Then a personalization layer. Three technically sound components — and now users are getting a contradictory experience that none of the individual feature reviews could have predicted. The models are working. The system is broken.

Charlie Munger’s latticework principle — the idea that knowledge only becomes useful when it hangs together in connected frameworks rather than isolated facts — is the most precise description I’ve found for what AI product teams are missing. Here’s what it looks like in practice.

The Hidden Architecture Problem in AI Product Development

Traditional software features behave in largely predictable ways. You add a filter to a list view; it filters the list. The feature is self-contained. The mental model for reviewing it is simple: does it do what it’s supposed to do?

AI features don’t work this way. They create feedback loops. They shape user expectations. They interact with other AI features in ways that emerge from scale, not from design sessions. A recommendation engine that’s technically optimizing for engagement can be simultaneously teaching users that the product doesn’t understand what they actually need. The technical metric is green. The user relationship is eroding.

This is the hidden architecture problem: the behavior of your AI product portfolio as a system is different from — and often opposed to — the behavior of each feature in isolation. Munger’s latticework principle is the framework for thinking about both at once.

Mental Models vs. Best Practices: What the Latticework Changes

Most AI product teams operate by best practices. A/B test the recommendation algorithm. Use the highest-performing model for your use case. Personalize based on user behavior data. These aren’t wrong — they’re just dangerously incomplete when applied without the mental models that reveal their limits.

“A/B test your recommendations to optimize conversion” — the mental model that challenges this: conversion rate optimization in AI systems often conflicts with long-term trust. Users can’t easily verify the quality of AI recommendations. Optimizing for short-term conversion can train users to distrust the recommendations that actually serve them best.

“Use the highest-performing model” — the mental model that challenges this: model performance is one variable in user adoption. Latency, explainability, and consistency often matter more than raw accuracy for real-world usage patterns. A slightly less accurate model that responds in 200ms and explains its reasoning may outperform a more accurate model that takes 2 seconds and offers no explanation.

“Personalize based on behavior data” — the mental model that challenges this: personalization creates feedback loops. The system effects — filter bubbles, amplified biases, reduced serendipity — often outweigh the individual user benefit of better-targeted content. At scale, you may be building a product that each individual user finds useful while simultaneously reducing the quality of the collective experience.

These aren’t theoretical concerns. They’re the failure modes of AI products that ships feature-by-feature without a connected framework for evaluating how those features behave as a system. MIT Technology Review’s analysis of AI product failures consistently surfaces these systemic issues as more common than pure technical failures.

Building Your AI Mental Model Latticework

The practical application is a shift in the questions you ask at product review. Instead of evaluating each AI feature in isolation — does this model perform? does this feature get used? — you evaluate the portfolio as a system:

How does this feature change user expectations for every other AI feature in the product? If your AI writing assistant produces near-perfect outputs, does that raise the bar for your AI translation feature in a way that creates dissatisfaction where none existed before?

How do this feature’s feedback loops interact with the feedback loops of other AI features? A recommendation engine and a search ranking model fed by the same behavioral data are creating compound feedback loops. Are they amplifying each other toward better outcomes or toward worse ones?

How does the portfolio of AI features together change the user’s mental model of what the product is? Users don’t experience features — they experience a product. The cumulative effect of your AI features determines whether users trust the product, rely on it, or grow quietly skeptical of it. That cumulative effect is what you’re actually building, whether you’re thinking about it or not.

The Munger Test for AI Product Decisions

I run what I think of as the Munger test before any significant AI feature decision: can I explain why this feature will improve the overall product — not just its isolated metric — using at least three different mental models?

If I can only make the case from one angle — it’s technically better, or it’s cheaper, or users asked for it — I’m probably missing something important. The features that hold up under the multi-model test tend to ship well. The ones that only make sense from one angle tend to create problems we didn’t anticipate.

That’s the compound interest of connected thinking. Each mental model makes the others more useful. The latticework, over time, makes you a dramatically better evaluator of AI product decisions than any individual framework alone.


Your Turn: Apply This Today

Start building the latticework habit with these concrete steps:

  • Map your AI feature portfolio as a system. List all the AI features currently live in your product. Draw arrows between any two features that share data, shape user expectations about each other, or create feedback loops. If you’ve never done this, you will find something that surprises you.
  • Apply the three-model rule to your next feature decision. Before approving or building any AI feature, require the team to make the case from at least three different mental model angles — technical, behavioral, economic. If the feature only looks good from one angle, hold it until the others are addressed.
  • Run a “best practices challenge” in your next review. Take one of your team’s standard AI best practices and ask: what is the mental model that challenges this? What is the failure mode of this practice at scale? The answer will improve your practice or flag a risk you haven’t accounted for.
  • Evaluate user trust as a system-level metric, not a feature-level metric. Create or request a metric that captures user trust in your AI product overall — not per feature. Track it quarterly. Connect it explicitly to your AI feature decisions.
  • Design your AI features’ feedback loops on paper before you ship them. For each AI feature in your pipeline, sketch the feedback loop: what user behavior does this feature reward? What does rewarding that behavior do to user behavior over time? Is the loop pointing toward value or away from it?
  • Run the Munger test before your next roadmap commit. For each AI initiative on your roadmap, ask: can I explain the expected value using three different frameworks? If not, spend 30 minutes developing the weaker angles before committing. The discipline is the point.

The individual mental model disciplines that feed into this latticework are worth studying separately — I’ve written about how Munger’s inversion principle applies specifically to AI feature development, and how building the multi-discipline lens changes the questions you ask at product reviews.

Building AI products and looking for a sharper framework for portfolio-level thinking? I consult with product teams on AI product strategy, system-level design, and building the decision habits that make the difference between features that work in isolation and products that work at scale. Let’s talk.

Munger’s Mental Models and AI: Why the Best Product Decisions Require Multiple Lenses

Charlie Munger spent his career arguing that the single biggest mistake smart people make is solving every problem with the same framework. “To the man with a hammer, every problem looks like a nail,” he said. Most product teams building AI features are doing exactly that — and it’s costing them.

I’ve watched this play out repeatedly. A team with strong ML engineers defaults to “more data and better algorithms” for every problem, even when the actual failure is behavioral — users don’t trust the output. A team with strong business analysts defaults to “what do the numbers say?” even when the numbers are measuring the wrong thing. The mental model a team defaults to determines which problems they can see and which ones stay invisible until they ship something that breaks.

Munger’s solution — collecting mental models from multiple disciplines and using them together — is the most underrated framework for AI product decisions I’ve encountered. Here’s how I actually use it.

The Mental Models I Use for AI Product Decisions

I’m not talking about generic “think differently” advice. These are the specific lenses I apply before any significant AI feature decision:

Psychology: How does this change what users think the system understands about them? AI features carry an implicit promise — the product will understand me better than a static interface. When that promise breaks, the damage to trust is disproportionate to the error. I ask: what does this feature imply about our understanding of the user, and can we actually back it up?

Statistics: What is this actually measuring, and what’s the sample size? We once nearly shipped an AI feature that performed beautifully in testing, then failed on real user data because our test set had selection bias — it didn’t represent how users in the wild actually phrased their searches. The statistical lens caught it. The ML lens wouldn’t have.

Economics: What are the compute costs, and what are the switching costs? I’ve seen teams celebrate engagement lift from an AI feature without running the numbers on what serving that feature at scale costs per user. The unit economics of AI features can invert a business model quickly. Always model the full cost before you ship.

Operations: How do we monitor this, and how do we roll back? The most sophisticated AI feature is worthless if it can’t be operated reliably in production. Before shipping anything, I want to know: what’s the alert? What does rollback look like? Who is on call?

Behavioral economics: What cognitive biases does this feature create or exploit? AI recommendations tend to trigger automation bias — users trust confident-looking outputs more than they should. Understanding this in advance lets you design the trust calibration into the UI rather than discovering the overtrust problem after something goes wrong.

The Five Questions I Ask in Every AI Product Review

I changed my product review process because of Munger’s approach. Instead of asking “does this AI feature work?”, I now run five questions in sequence:

1. Technical: Does this solve the computational problem we think it solves — not just on test data, but on the actual distribution of real user behavior?

2. User: Does this actually improve the user’s experience, or does it just look impressive in a demo? What happens when it’s wrong?

3. Business: Does this advance the business model, and does the unit economics hold at scale?

4. Ethical: What user behaviors does this feature create or reinforce at scale? Are we comfortable with those behaviors?

5. Operational: Can we run this reliably, monitor it, and recover quickly when it fails?

A feature that can’t pass all five is not ready to ship. The discipline of asking all five questions — rather than defaulting to the one or two that align with your team’s strongest skills — is where the multi-model approach pays off.

How I Build the Mental Model Collection Habit

Munger didn’t just use multiple mental models — he actively collected them, continuously, over decades. For AI product work, here’s how I collect mine:

I read outside my discipline regularly. Cognitive psychology, behavioral economics, systems design, and operations research all surface problems that pure product management literature misses. The best AI product insight I had in the last year came from reading about medical device failure modes — not from a PM newsletter.

I actively seek out the perspective of people who think differently about the same problem. Our data scientists, engineers, designers, and business stakeholders all carry different mental models for evaluating AI features. The disagreements between them are usually where the important insight lives.

I study failures from adjacent industries. Financial services, healthcare, and automotive all deal with AI deployment challenges that preceded ours by years. Their failure modes tend to become our failure modes. Learning from their mistakes is faster and cheaper than repeating them.

Why This Matters More for AI Than Traditional Features

Traditional product features fail in fairly predictable ways — wrong assumption about user need, too complex, poor performance. AI features fail in fundamentally different ways: confident wrongness, emergent behaviors at scale, distributional shift between training and production, trust collapse after a single high-visibility error.

These failure modes don’t show up clearly through any single mental model. You need the statistical lens to catch distributional shift. You need the behavioral economics lens to anticipate trust collapse. You need the operations lens to design for recovery when something breaks at 3am. The teams that appear to consistently ship AI features that actually work are the ones that have learned to hold all of these perspectives simultaneously — not in sequence, but together.

That’s what Munger was getting at. In complex systems, the quality of your thinking matters more than the sophistication of your tools. AI is the most complex system most product teams have ever managed. The mental model collection habit is how you build the thinking to match it. For more on the practical frameworks that apply here, see how Farnam Street maps the core mental model disciplines — it’s the best resource I’ve found for building this habit systematically.


Your Turn: Apply This Today

The multi-model habit is built through deliberate practice — not a single session. Start here:

  • Run the five-question audit on your current highest-priority AI feature. Technical, user, business, ethical, operational — can it pass all five? Write down the answers before your next review. The question your team struggles to answer is where the risk lives.
  • Map your team’s dominant mental model. Ask each key team member: “What’s the first question you ask when evaluating a new AI feature?” The pattern in their answers tells you which blind spots you have as a team. Hire or consult to fill the gaps.
  • Add one non-PM discipline to your reading rotation. Pick one field — behavioral economics, systems design, operations research, cognitive psychology — and read one substantive piece per week for 90 days. Track how it changes the questions you ask in product reviews.
  • Make the unit economics of your AI features visible. Pull the compute cost data for your top three AI features. Calculate cost per user per month. If you’ve never seen these numbers, run the analysis before your next roadmap planning session.
  • Design a failure recovery path before you ship. For every AI feature in your pipeline, define in advance: what is the rollback plan? What triggers it? Who makes the call? Teams that answer these questions before shipping recover faster when something breaks.
  • Seek out the dissenting voice in your team before the next decision. Deliberately ask the person on your team most likely to see the problem differently. The disagreement is the insight. If everyone agrees immediately, you’re probably all using the same mental model.

These same multi-model disciplines apply at the team level too — how you hire PMs for AI-era roles determines which mental models your team has access to from day one. And the inversion principle is one of the most powerful single mental models in Munger’s collection for AI product work specifically.

Building AI products and struggling with decisions that require multiple frames at once? I consult with product teams on AI product strategy, decision frameworks, and building the organizational thinking habits that make great products. Let’s talk.

What Children’s Ministry Taught Me About Product Simplicity

Last year I watched a volunteer children’s ministry director attempt to run Sunday school prep on her phone while her toddler climbed on the kitchen counter. She had exactly seven minutes between finishing breakfast and loading the car. The curriculum platform she was using required three separate logins, two PDF downloads, and a supply list that assumed access to a craft store and a laminator.

She gave up and winged it with goldfish crackers and a Bible story from memory.

That moment taught me more about product simplicity than any design framework I’ve read. When your user is a part-time volunteer with no training budget, no tech support, and seven minutes to prepare — you learn fast what “simple enough” actually means.

I’ve carried those lessons into every product role since. Here are four that I apply constantly.

Design for the Five-Minute User, Not the Power User

Every product has a “five-minute user” somewhere in the workflow — the person who needs to accomplish a core task under conditions of time pressure, cognitive load, and limited context. They’re not the user who fills out your feedback surveys or attends your user interviews. They’re the ones who quietly abandon your product and find a workaround when it gets in their way.

For Sermons4Kids, that user was the Sunday morning volunteer in the parking lot at 8:55am. For enterprise products, it’s the admin configuring your tool during their lunch break because IT is overloaded. For B2C products, it’s the user trying to accomplish something during the three-minute window between meetings.

Most product teams design for the engaged, patient, curious user who will explore the product to find what they need. The five-minute user has none of those qualities in the moment that matters. They need the most important task to be one tap away with no ambiguity about where that tap is.

The diagnostic question: If someone with no training and no time needed to complete the single most important task in your product, could they find it in under 30 seconds? If not, that’s your simplicity problem.

Remove the Decision Tree, Not the Content

We rebuilt the Sermons4Kids homepage after session data showed that a significant percentage of users were abandoning at the content selection step. We had years of excellent curriculum. We had a well-organized library. We had filters and categories and search. Users still gave up.

The fix wasn’t removing content — it was removing the decision the user had to make. Now the homepage shows exactly four options: This Sunday’s Lesson, Last Week (for catch-up), Next Week (for advance planners), and Search. That’s it. Every other organizational structure lives behind Search for motivated users, but it doesn’t block the path for the Sunday morning volunteer who just needs this week’s materials.

Session completion rates improved immediately. The content didn’t change. The decisions the user had to make did.

Every product has a version of this problem: an information architecture that’s logical and comprehensive but requires users to make a series of decisions before they can do the thing they came to do. The solution is almost never more content or better organization — it’s opinionated defaults that route the majority of users directly to what they need, with the full flexibility available one level deeper for the users who want it. This connects directly to what Barry Schwartz documented: more choices create paralysis, not empowerment.

Question the Digital-First Default

This one surprised me. In a world of responsive design and mobile-first product development, the most valuable feature we built for Sermons4Kids was making everything print-perfectly every time.

Children’s ministry happens in physical spaces with physical kids. Volunteers need something they can hold, write on, and hand out. They need a backup when the WiFi fails. We spent months optimizing mobile responsiveness before realizing that “mobile optimization” for this use case meant “fits on one page when printed from a phone.” Not responsive design — print design.

The broader lesson: digital-first is a design assumption, not a universal user truth. The right question is “what format serves this user in their actual context?” — not “how do we deliver this digitally?” I’ve seen the same pattern in analytics tools where print-ready dashboards increased executive adoption more than better mobile interfaces, and in consumer apps where PDF export drove more engagement than in-app features for users whose workflows involved physical handoffs.

The Onboarding Moment Is Earlier Than You Think

For volunteer users, we discovered that the meaningful onboarding moment wasn’t first login — it was the first successful Sunday. The first time a volunteer used our curriculum and it worked in the classroom, with real kids, under real conditions, was when they became retained users. Everything before that was trial, not adoption.

This shifted how we thought about activation. The question wasn’t “did they complete setup?” — it was “did they succeed at the thing they came to do?” And the second question had very different design implications than the first. It meant making the path from signup to first successful use as short and obstacle-free as possible, even at the cost of features and configuration options that would be valuable later.

Most products define activation as completion of a setup flow. The better definition is completion of the first meaningful value exchange: the user did the thing they came to do, and it worked. Building backward from that moment tends to reveal a lot of unnecessary friction in the path to get there.

These principles show up everywhere once you’ve internalized them — in enterprise software, consumer apps, and AI products. The choice overload problem in AI features is fundamentally the same problem as the children’s curriculum homepage: unlimited options create abandonment. The design principle that solved it for Sunday school volunteers is the same one that solves it at scale.


Your Turn: Apply This Today

Take these directly into your next product review, sprint planning, or roadmap conversation:

  • Map the constraint first. Identify your most time-pressured user segment. What is their real-world window to complete the core task? Design to that number — not to your average session length.
  • Count steps to first value. Walk through your onboarding or core workflow and count every tap, click, decision, and login required before the user gets something useful. If it’s more than five, cut.
  • Remove one option this sprint. Find a feature or navigation choice that fewer than 10% of users engage with. Archive it. Measure whether anyone notices.
  • Rewrite one tooltip as a verb. Anywhere you explain what something is, rewrite it as what the user should do next. “Lesson plans” becomes “Start this week’s lesson.”
  • Test with your least technical user. Give the product to the person in your target audience least comfortable with technology. Watch without coaching. The first pause is your biggest design problem.
  • Apply the seven-minute rule. Before shipping any new feature, ask: can a distracted, under-resourced user get real value from this in under seven minutes? If not, simplify before you ship.

These principles also show up when you’re new to an organization: the first 100 days of a product leadership role require the same “design for the five-minute user” discipline applied to your own stakeholders — getting to value quickly, removing unnecessary process friction, making the path to trust short.

Building products for time-constrained, mission-driven users and fighting complexity at every stage? I consult with product teams on simplicity-first design, activation optimization, and building for users whose real-world context is nothing like your test environment. Let’s talk.

Munger’s Circle of Competence: Why AI Makes Domain Expertise More Valuable, Not Less

“I’m no genius. I’m smart in spots — but I stay around those spots.” That’s Charlie Munger on his own approach to investing. His “circle of competence” framework wasn’t about false modesty. It was a precise statement about where expertise creates durable advantage and where it evaporates.

Most product leaders are getting this completely backwards with AI.

They treat AI as something that expands their circle of competence — a tool that lets them write SQL they don’t understand, run experiments they can’t interpret, or make architectural decisions they can’t maintain. The result is technical debt disguised as productivity, and strategic errors disguised as confidence.

AI Amplifies What You Already Know — and Exposes What You Don’t

Munger’s insight applied to AI: the tool doesn’t expand your circle of competence. It amplifies your performance within it, while simultaneously making your gaps more consequential.

A product leader who understands growth mechanics deeply will use AI to run experiments faster, synthesize more signals, and identify patterns across larger datasets than they could manually. They get more leverage on what they already know. A product leader who doesn’t understand growth mechanics will use AI to generate impressive-looking growth frameworks they can’t evaluate, can’t debug, and can’t adapt when they inevitably fail in their specific context.

The output in both cases looks similar. The outcomes diverge dramatically.

The core failure mode is what I’d call the competence paradox: AI tools make you feel more capable in domains where you’re actually less prepared to evaluate the output. SQL you can’t debug feels more dangerous when you generate it in 30 seconds. Architectural decisions you can’t maintain feel more permanent when AI produces confident-sounding documentation. The confidence signal and the competence signal get decoupled.

The Three-Layer Filter for AI-Assisted Work

I use three questions before implementing any significant AI-generated output:

Can I debug this? If the AI generates analysis, code, or strategy, can I troubleshoot it when it breaks? Not “can I theoretically figure it out” but “can I actually debug it in context, with my real systems, under time pressure?” If not, I’m outside my circle of competence and I should either build the competence first or bring in someone who has it.

Can I improve this? If the output is 80% right, can I identify and fix the 20% that’s wrong? Last month Claude suggested a user segmentation approach that looked clever. Layer 1 caught it immediately: the approach would require behavioral tracking infrastructure we didn’t have, and the suggested proxies for that tracking were measuring the wrong thing. I spotted the gap because I understood the domain. Someone without that foundation would have shipped an experiment that wouldn’t generate interpretable results.

Can I teach this? If I can’t explain to a colleague why this approach works and where it might fail, I don’t understand it well enough to implement it. “The AI recommended it” isn’t a teaching explanation — it’s intellectual dependency dressed up as delegation.

Why Domain Expertise Compounds in an AI World

The counterintuitive conclusion: AI makes domain expertise more valuable, not less. Early research on AI productivity gains consistently shows higher lift among workers who already have strong domain knowledge — the AI accelerates their existing competence rather than substituting for absent competence.

This pattern shows up in every domain I’ve seen AI implemented well: content teams with strong editorial judgment use AI to produce more without losing quality, because they can evaluate outputs accurately. Data teams with strong analytical foundations use AI to explore more hypotheses faster, because they know which results are meaningful and which are artifacts. Product teams with deep customer knowledge use AI to synthesize more signal, because they know what the signal means.

The implication for your own development: the most important thing you can do to increase your AI leverage is deepen your domain expertise, not broaden your AI tool exposure. Munger’s advice — stay around your spots, maximize leverage within them — is more applicable than ever.

Munger’s broader framework for rational decision-making is worth reading with AI in mind: the latticework of mental models he described is exactly the foundation that allows you to evaluate AI outputs rather than simply accept them. The people who will get the most from AI are the ones who’ve built strong mental models in their domain — not the ones who’ve accumulated the most prompting tricks.

If you’re also thinking about how AI changes what you need to hire for, this connects directly to the AI-era PM hiring question: the skills that matter most aren’t AI-specific, they’re judgment-based — which is exactly what Munger would have predicted.


Your Turn: Apply This Today

In an AI era, knowing your circle of competence — and staying inside it — becomes your strategic advantage. Here’s how to apply it:

  • Draw your circle of competence. Write down the specific domains where your experience gives you judgment that a general AI cannot replicate — industries, user types, technical contexts, organizational dynamics. Be ruthless about what’s inside the circle and what’s outside it.
  • Use AI as a force multiplier inside your circle, not an extension beyond it. Where you have deep expertise, use AI to go faster. Where you don’t, use AI to learn — but don’t use it to pretend you have expertise you don’t. The accountability gap will catch up with you.
  • Hire for complementary circles, not identical ones. When building an AI-era team, map each hire’s domain expertise circle. Where do they overlap with yours? Where do they extend it? Teams with overlapping but complementary circles outperform teams of generalists using the same AI tools.
  • Be explicit about AI-generated work that falls outside your expertise. When you use AI to produce work in a domain outside your circle — legal, financial, medical, technical — label it explicitly. Peer review it with someone inside that circle. AI confidence is not domain expertise.
  • Shrink before you expand. Before trying to extend your circle of competence using AI, go deeper in the areas where you’re already strong. AI amplifies depth more reliably than it substitutes for it. Master your existing domain before using AI to colonize new ones.
  • Build a “competence map” for your product team quarterly. Identify where your team’s circle of competence is strong, where it’s thin, and where AI is filling a gap that should be filled by a human hire. Treat the gap analysis as a strategic workforce planning tool.

Building an AI-native product team and figuring out where to invest in deepening domain expertise vs. AI capability? I consult with product leaders on AI strategy, team development, and the competence questions that determine whether AI investments pay off. Let’s talk.

Seneca’s Email Rule and the Hidden Cost of Real-Time AI

Seneca had a rule about correspondence: don’t respond to letters the moment they arrive. Set them aside. Let the thought settle. Respond when you have something worth saying rather than when social pressure demands a reply.

That’s not how we use AI.

We’ve built real-time AI into everything — always-on assistants, instant analysis, immediate responses to any question. The design assumption is that faster is better, that removing latency from every cognitive task creates value. But there’s a cost to always-on AI that’s rarely discussed: it can quietly erode the capacity for sustained, independent thinking that makes good product decisions possible.

The Real-Time AI Trap

Here’s a pattern I’ve noticed in my own work, and in teams I consult with: AI availability changes the character of thinking, not just the speed. When you know an instant answer is available, the instinct to work through a problem yourself diminishes. Not dramatically — just enough to matter.

I caught myself asking Claude to “think through the pros and cons” of a product prioritization decision I’d made dozens of times before. I wasn’t seeking new perspective. I wasn’t dealing with an unusual edge case. I was just avoiding the mild friction of working through it myself, because the alternative (ask AI, get immediate answer) was so frictionless.

That’s a problem. Not because AI analysis is bad, but because the ability to reason through prioritization decisions independently is a core PM competency — and like any competency, it atrophies without use. Offloading the thinking doesn’t just remove the task; it removes the practice.

Offloading vs. Outsourcing: The Line That Matters

There’s a useful distinction worth making explicit:

Offloading is using AI to handle tasks that don’t require your specific judgment — summarizing a 50-page research report, extracting action items from meeting transcripts, generating first-draft language for a spec. The value is real and the cost is low. You’re not losing anything by having AI do these tasks. You free up cognitive bandwidth for higher-leverage work.

Outsourcing is using AI to handle the judgment calls that should be yours — deciding which problems to prioritize, evaluating strategic trade-offs, determining what your users actually need vs. what they’re asking for. The immediate output looks similar, but the long-term cost is real: you’re not building the judgment muscle that makes you better over time. You’re borrowing it from a model that has no stake in your actual outcomes.

The line between the two is often subtle and context-dependent. Summarizing research is usually offloading. Asking AI to “tell me what to focus on this quarter” is usually outsourcing. “Help me stress-test the prioritization I’ve already done” is offloading. “What should we build next?” is outsourcing.

A Batched Approach to AI That Preserves Judgment

The batching principle is the practical application of Seneca’s rule. Instead of treating AI as a real-time response system for every cognitive task, structure AI interaction into deliberate sessions with protected thinking time in between.

Here’s the workflow I’ve been running for eight months:

Morning AI session: Feed the system previous day’s metrics, key questions I’m working on, and documents requiring analysis. Let it run. Close the interface. Don’t interact with AI analysis until I’ve done my own initial thinking on the same questions.

Midday synthesis: Review AI output alongside my own thinking. The AI analysis is one data point — sometimes it changes my view, sometimes it confirms it, sometimes it raises considerations I missed. The key: I’ve already done the first-pass reasoning before seeing what the AI produced.

Protected thinking blocks: Two 45-minute windows per day with AI interfaces closed. This is where strategy work happens — the kind of slow, uncomfortable reasoning about difficult trade-offs that can’t be outsourced without losing the benefit of doing it.

The counterintuitive result: I use AI more effectively when I use it less constantly. The batching creates space for my analysis to develop before AI analysis augments it, which produces better final outputs than feeding every question to AI the moment it arises.

The Organizational Design Implication

If you’re leading a product team, the real-time AI trap isn’t just a personal productivity issue — it’s an organizational one. Teams that use AI to substitute for strategic thinking will develop weaker strategic thinking over time. The meetings get faster. The decisions get more confident. And the quality of the underlying reasoning quietly declines.

The antidote is building deliberate judgment-development practices into team rhythms: requiring people to bring their own analysis before AI analysis is introduced, creating space for debate and disagreement that doesn’t get short-circuited by “the AI says,” and treating the quality of reasoning — not just the quality of outputs — as something worth protecting.

As Kahneman’s research shows, fast thinking is efficient but prone to systematic error. The AI automation paradox is that AI often makes fast thinking faster without making it more accurate — which is exactly the problem Seneca was trying to solve with his correspondence rule two thousand years ago.


Your Turn: Apply This Today

The cost of real-time AI is often invisible until it shows up in your team’s burnout or your product’s churn rate. Here’s how to surface and manage it:

  • Audit your AI features for “always-on” demands. List every AI feature in your product that creates an expectation of real-time response — from users or from your systems. For each one, ask: is this expectation necessary for value, or just for perceived responsiveness?
  • Calculate the true infrastructure cost of real-time AI. Pull the cost data on your most-used AI features. Break it down by: inference cost, latency infrastructure, on-call engineering support, and incident response. Present the full number to leadership alongside the user value. That’s the honest trade-off.
  • Segment your users by latency sensitivity. Not all users need real-time AI. Survey or instrument your user base to understand who notices — and cares about — latency vs. who just wants accuracy. Design tiered AI experiences that match investment to actual user need.
  • Introduce “async AI modes” for non-time-critical workflows. Identify user workflows where a 5-minute or 1-hour delay is completely acceptable. Move those workflows to async processing. Invest the cost savings in making real-time AI faster where it actually matters.
  • Set a “response time contract” with your users. Be explicit in your UI about when AI outputs are real-time vs. batch-processed. Users who know when to expect results are more patient than users who feel they’re waiting unnecessarily.
  • Apply Seneca’s discipline to your roadmap. Before adding any new real-time AI feature, ask: is this urgent for the user, or just urgent-feeling? If the value doesn’t require immediacy, design for async first and real-time only when the user’s workflow demands it.

The batching approach also helps with the problem Kahneman’s framework identifies: when you don’t interact with AI constantly, you can engage System 2 evaluation at the right moments rather than burning cognitive load on continuous vigilance.

Building AI into your product team’s workflow and watching independent judgment quietly erode? I consult with product organizations on AI work design, decision quality, and preserving the judgment capacity that makes AI actually useful rather than just fast. Let’s talk.

Kahneman’s System 1 and the AI Automation Paradox: Why Cognitive Load Goes Up, Not Down

Kahneman’s dual-process theory splits human cognition into System 1 (fast, automatic, intuitive) and System 2 (slow, deliberate, analytical). System 1 handles routine pattern-matching instantly. System 2 engages for novel problems requiring conscious reasoning. The trouble is that System 2 is expensive — it’s slow, effortful, and tires easily. Humans naturally try to minimize how much they use it.

Here’s what Kahneman didn’t fully anticipate when he published Thinking, Fast and Slow in 2011: what happens when AI automates the System 1 tasks but simultaneously requires System 2 oversight of everything it does.

That’s the AI automation paradox. And it’s creating a cognitive load problem that most product teams haven’t fully reckoned with.

Why AI Automation Increases Cognitive Load Instead of Reducing It

When I started running a 20-agent AI system, I expected it to free up cognitive bandwidth. It did — for the tasks the agents handled correctly. What I didn’t account for was the continuous System 2 vigilance required to monitor outputs I could no longer trust automatically.

My email triage agent sorts and prioritizes correctly about 90% of the time. That sounds high. But 10% errors across 200 emails a day means 20 potentially misrouted messages — enough to create real problems. So instead of spending System 1 attention on scanning email (which I was good at), I now spend System 2 attention verifying the agent’s categorizations. That’s cognitively harder, not easier.

This matches what Ethan Mollick documented in his research on human-AI collaboration: AI creates “jagged frontiers” where performance is excellent in some areas and surprisingly poor in adjacent ones, with no obvious pattern. Users can’t develop the intuitive trust that would allow System 1 to handle AI oversight. Every interaction requires deliberate evaluation.

The result: permanent System 2 vigilance for tasks that AI is supposedly handling for you.

The Design Implications for AI Products

Understanding the Kahneman paradox should change how you design AI features — both for your users and for your own team’s workflows.

Narrow scope reduces oversight burden. The AI tools I trust most handle a single, well-defined task with transparent reasoning. My meeting transcript parser does one thing: it extracts action items and shows its reasoning for each one. The narrow scope makes verification fast. The broad-scope tools that promise to “handle everything” create the most cognitive load because you can never develop calibrated trust — each output requires full evaluation.

Confidence signaling is a product feature, not a nice-to-have. Tools that flag their own uncertainty let users shift appropriately between System 1 and System 2. When an output is flagged as low-confidence, the user engages deliberate evaluation. When it’s flagged as high-confidence, they can trust it more readily. This isn’t just better UX — it’s cognitively more honest about what AI actually does. Most AI tools present every output with equal confidence, which forces users into constant System 2 vigilance as the safe default.

Failure mode design matters more than accuracy optimization. The most dangerous AI failure is not “wrong with high confidence” on a flagged edge case — it’s “wrong with high confidence on something that looks routine.” Design your AI features to fail obviously, not silently. When the agent can’t handle something well, it should say so explicitly rather than producing a plausible-sounding but incorrect output. Graceful, visible degradation builds appropriate trust calibration over time.

When to Let System 1 Take Over

The goal isn’t permanent System 2 vigilance — that’s unsustainable and defeats the productivity case for AI. The goal is building the pattern recognition that allows appropriate trust to develop over time.

This happens naturally when the AI operates in a narrow enough domain, with enough transparency, for long enough that users can develop accurate intuitions about where it succeeds and where it fails. That’s the design objective: not “AI handles everything” but “users develop calibrated trust in specific AI behaviors” — which takes time, transparency, and intentional scope management on your part as a product builder.

Nielsen Norman Group’s research on AI mental models points in the same direction: users who develop accurate mental models of AI capabilities use AI tools more effectively and report higher satisfaction than those operating with either over-trust or under-trust. Your product design should actively support accurate mental model formation, not just maximize apparent capability.

If you’re thinking about how this connects to team productivity, the Kahneman paradox is part of why AI makes knowledge workers more productive but not always more effective — the cognitive overhead of AI oversight is a real cost that productivity metrics rarely capture.


Your Turn: Apply This Today

The cognitive load paradox is subtle but consequential. Here’s how to design around it:

  • Audit your AI features for “trust calibration.” For each AI output your product surfaces, ask: does the UI communicate how confident the AI is, and what the cost of being wrong is? Presenting uncertain outputs with the same visual weight as certain ones trains users to overtrust.
  • Design “friction with purpose” into high-stakes AI decisions. Where an AI recommendation could lead to a significant downstream consequence, add a confirmation step — not to slow users down, but to activate System 2 thinking. The pause is the feature.
  • Track “AI override rates” as a product health metric. Measure how often users accept AI suggestions without modification vs. how often they edit or reject them. If the override rate is near zero, users may be overtrusting outputs that don’t deserve it.
  • Run a “cognitive load audit” on your highest-traffic AI workflow. Map every decision a user makes in the flow. For each one, ask: is the AI reducing cognitive load in a way that helps, or in a way that just defers the thinking to a less capable moment?
  • Test your AI features with novice and expert users separately. The cognitive load paradox hits differently across experience levels. Experts are more likely to catch AI errors; novices are more likely to overtrust. Design different trust signals for different user segments.
  • Build an “error recovery” path for every AI recommendation. Design the product so that when an AI suggestion turns out to be wrong, the user can recover without significant cost. If the cost of recovery is high, you must add more human checkpoints before the decision commits.

Building AI features and seeing that users aren’t developing the trust and adoption you expected? I consult with product teams on AI UX design, cognitive load, and building the right trust architecture for AI-powered products. Let’s talk.

When AI Access Becomes Universal: What Actually Differentiates Your Product

Sam Altman’s “Universal Basic Compute” proposal — providing every person direct access to computing power rather than just AI-generated outputs — is one of the more interesting ideas circulating in AI policy circles right now. The concept: instead of waiting for AI productivity gains to trickle down through labor markets, give people direct stakes in the compute layer itself. Think universal healthcare, but for GPUs.

Whether or not Altman’s specific proposal gains traction, the question it raises is worth sitting with for anyone building AI products: When compute becomes democratized, what actually determines who wins?

The Access Fallacy in AI Product Strategy

A lot of product strategy right now is implicitly built on access as the moat. We have access to GPT-4, or Claude, or Gemini — and our competitors don’t, or can’t afford it, or don’t know how to integrate it yet. The access gap creates a temporary competitive advantage.

But access moats in software have a historical tendency to close faster than anyone expects. AWS made infrastructure a commodity. Stripe made payments a commodity. Shopify made e-commerce infrastructure a commodity. In each case, the teams that built durable advantages weren’t the ones who had early access to infrastructure — they were the ones who had deep understanding of the customers that infrastructure served.

AI access is on the same trajectory. When compute democratizes — and it will, whether through Altman’s UBC proposal or through market price compression or through open-source model proliferation — the teams with genuine customer depth will compound. The teams whose strategy was primarily “we have AI and they don’t” will find themselves competing on a level playing field with no differentiation.

What Actually Differentiates When Access Is Universal

I’ve been thinking about this through the lens of what I’ve observed building digital products for ministry and faith-based audiences across multiple continents. The access gap closed faster than expected in those markets — within 18 months of us integrating AI capabilities, smaller competitors had comparable features. The products that retained users weren’t the ones that had AI first. They were the ones that had the best data on their specific users, the deepest understanding of context, and the most nuanced sense of what “correct” actually means for their audience.

Universal compute access amplifies this dynamic. When everyone can fine-tune models on their domain data, the differentiator isn’t who has AI — it’s who has:

  • Better training data. Your proprietary understanding of user behavior, domain-specific language, and edge cases that general models handle poorly.
  • Deeper contextual judgment. The ability to know when the AI output is 95% right but contextually wrong in the 5% that matters most for your specific users.
  • Domain expertise that AI can’t replicate without you. The tacit knowledge of what “good” looks like for your users that only comes from years of serving them closely.

This is essentially the sovereignty question applied at the product level: the teams with data sovereignty and domain sovereignty will outperform those with only infrastructure access.

The Distribution Problem That Persists

Altman’s proposal also highlights something product teams building global products need to think about more carefully: democratized access doesn’t automatically produce democratized value.

When we expanded our platform to serve users in the Global South — smaller congregations, rural communities, under-resourced organizations — equal access to the same AI features didn’t produce equal outcomes. The features were calibrated on majority-culture usage data. The AI’s judgment about what was “helpful” reflected a particular cultural context. The teams with equal access to the technology were not equally served by it.

Real democratization isn’t just access to compute. It’s access to compute that reflects your context, your language, your use cases, and your community’s understanding of what good looks like. That gap doesn’t close by lowering the price of GPUs. It closes through deliberate investment in representative data and culturally-aware product design — which is an organizational commitment, not an infrastructure problem.

What This Means for Your Roadmap Now

If your AI product strategy relies primarily on access advantages, now is the right time to audit what else you’re building. The questions worth asking:

  • What proprietary data do you have that makes your AI outputs genuinely better for your specific users than a competitor using the same foundation model?
  • What domain knowledge is baked into your product design that a new entrant with equal AI access couldn’t replicate quickly?
  • Are you building the customer relationships and feedback loops that compound into better AI outputs over time?
  • If your AI access became equally available to your top three competitors tomorrow, what would still differentiate you?

The teams thinking through those questions now will be better positioned when the access gap closes — and it will close. The question is whether your strategy is built on a foundation that gets stronger when it does, or one that becomes irrelevant.

This connects to the deeper question of what your product is actually for — because access democratization also changes who your product is obligated to serve well, not just who it can technically reach.


Your Turn: Apply This Today

If AI is table stakes, your differentiation must come from somewhere else. Here’s how to find it:

  • Inventory your non-AI moats. List every source of competitive advantage your product has that has nothing to do with AI — network effects, proprietary data, community, brand trust, distribution, switching costs. If the list is short, that’s your strategy problem to solve.
  • Run a “when AI does it for free” scenario. Identify the three AI features you’re most proud of. Then ask: if OpenAI, Google, or Microsoft offered the same capability for free tomorrow, what would remain of your value proposition? Build toward that remainder.
  • Find the “contextual advantage” in your market. What does your product know about your specific user context — their industry, their past behavior, their workflow — that a general AI cannot know without being told? Build features that leverage that context depth.
  • Invest in trust infrastructure, not just feature infrastructure. As AI becomes ubiquitous, users will choose products they trust to handle their data, protect their privacy, and behave consistently. Document and communicate your AI governance practices. Trust is a differentiator.
  • Map the “last mile” problem in your category. AI handles the generic parts of most workflows. What’s the last 20% that requires human judgment, contextual knowledge, or professional accountability? Design your product to excel in that space, not compete in the generic space where AI wins on cost.
  • Define your “AI commodity” timeline. Make a bet: in 18 months, which of your current AI features will be commoditized? Work backward from that bet to decide what to build now that has staying power beyond the commoditization wave.

Building AI products and thinking through what differentiates you as the access moat closes? I consult with product leaders on AI product strategy, data differentiation, and building durable competitive advantages in AI-native markets. Let’s talk.

The Scaffolding Problem: Why AI Learning Tools Build Dependency Instead of Capability

Ethan Mollick has a scaffolding problem. He’s spent years making the case for AI as a learning partner — Co-Intelligence is genuinely persuasive on this — but the research on AI-assisted learning keeps surfacing a troubling pattern: when people use AI as scaffolding to learn complex skills, they often become dependent on the scaffold rather than developing the underlying capability.

This isn’t unique to AI. Apprenticeship models have been solving — and sometimes creating — this problem for centuries. The master-apprentice relationship is the oldest answer to the question of how you transfer tacit knowledge from someone who has it to someone who doesn’t. It also predates every digital learning system by millennia, and it still works better than most of them. Understanding why tells you a lot about where AI learning tools succeed and where they’re going to fail.

The Scaffolding Paradox in Practice

Here’s the problem in concrete terms: I’ve built a 20-agent AI system that handles significant portions of my workflow. I write better first drafts faster, synthesize research more thoroughly, and spot patterns across data sets that I would have missed before. These are genuine capability improvements.

But when I assess the people on my teams who have adopted AI tools most enthusiastically, I’m noticing something uncomfortable: their AI-assisted work product is improving, but their independent judgment on the same types of problems isn’t improving at the same rate. They’re getting better outputs with AI. They’re not necessarily getting better at the underlying reasoning that generates good outputs.

This is exactly what educational technology research on scaffolding has documented for decades. Scaffolding that removes the struggle of learning — that answers the question before the learner has grappled with it — produces more efficient short-term outputs and less durable long-term capability. Students who use calculator apps become faster at arithmetic and slower at mathematical reasoning. Writers who use grammar checkers produce cleaner prose and develop weaker editorial instincts. The scaffold substitutes for the learning rather than accelerating it.

What Apprenticeship Gets Right That AI Scaffolding Usually Misses

The master-apprentice model works because it’s built on three principles that most AI learning tools violate:

Context before process. Apprentices don’t get handed a manual. They observe the master working in real contexts — seeing the judgment calls, the trade-offs, the moments of uncertainty before decisions. The learning is embedded in practice, not abstracted into steps. Most AI tutoring optimizes for process efficiency: here’s the template, here’s the workflow, here are the five steps. The tacit knowledge of when to apply which approach — the judgment layer — doesn’t transfer through templates.

Struggle as pedagogy. Good mentors don’t rescue apprentices from difficulty; they use difficulty as the teaching medium. The moment the apprentice gets stuck is often the highest-value learning moment — it surfaces where their mental model diverges from reality. AI scaffolding tends to eliminate this. Stuck on a document? AI drafts it. Can’t find the right framing? AI suggests five. The struggle that would have built capability gets bypassed entirely.

Progressive responsibility transfer. The best apprenticeship relationships follow a deliberate arc from observation to supported practice to independent execution with feedback to autonomous work. This progression is designed, not accidental. Most AI tools don’t have a model of the learner’s development over time — they just answer whatever question is asked, at whatever level of support is requested, with no view toward building toward independence.

What This Means for Building AI Learning Tools

The AI learning products that will create durable value are the ones that preserve productive struggle rather than eliminating it. Practically, this means:

Design for delayed assistance. The most sophisticated AI tutoring designs I’ve seen require users to make their own attempt before unlocking AI assistance. This preserves the learning that happens in the struggle phase while preventing frustration from becoming discouragement. The friction is intentional — it’s doing work.

Ask questions instead of providing answers. An AI that responds to “how do I approach this problem?” with a clarifying question — “what have you tried so far, and where did that break down?” — is building the learner’s reasoning. An AI that just answers the question is substituting for it. Mollick’s own research on AI pedagogy points in this direction: the AI-as-Socrates model outperforms the AI-as-encyclopedia model for capability development.

Build mental models explicitly. Scaffolding that helps users understand why something works — not just what to do next — builds transferable reasoning rather than step-following. This is harder to build and harder to measure, which is why most AI tools don’t do it. But it’s the difference between producing capable practitioners and producing tool-dependent process followers.

The Stakes Are Higher Than They Look

If you’re building AI tools that people use for learning or skill development — whether that’s a sales enablement tool, a customer service training platform, an onboarding system, or an educational product — the scaffolding question determines whether your product creates lasting value or lasting dependency.

Products that make people capable compound in value. Users become more effective over time, credit the tool for helping them get there, and advocate for it because they can point to real skill growth. Products that create dependency are fragile — valuable when present, debilitating when absent, and increasingly resented when users realize they haven’t actually grown.

I’ve seen this dynamic in the platforms we’ve built for ministry and faith formation: tools designed to make practices easier often made practitioners weaker. The same dynamic shows up in corporate learning systems, sales tools, and product development processes. The question isn’t whether AI can help people work through complex tasks. It’s whether it’s helping them build the judgment to work through those tasks more independently over time.

If you’re thinking about how AI tools change the skill requirements for product roles, this connects directly to the hiring question: the PMs who will be most valuable in an AI-native organization are the ones who’ve built judgment, not just the ones who’ve built AI fluency.


Your Turn: Apply This Today

Designing for learning — rather than just task completion — requires intentional choices. Here’s where to start:

  • Audit your AI features for scaffolding vs. substitution. For each AI feature in your product, ask: does this help users develop capability, or does it complete the task so thoroughly that users stop developing the underlying skill? If it’s the latter, you have a dependency risk.
  • Design one “fade-the-scaffold” feature this quarter. Identify an AI assist that new users need but experienced users don’t. Build a mechanism that gradually reduces the assist as user competency grows — like training wheels that automatically come off.
  • Interview users about what they’ve learned from your product. Ask: “What can you do now that you couldn’t before you started using us?” If the answer is “nothing — the AI just does it for me,” you’re building dependency, not capability. Both can be valid business models, but know which one you’re running.
  • Add a “learning mode” vs. “efficiency mode” toggle to high-dependency features. Let users choose whether they want the AI to do it for them or guide them through doing it themselves. The toggle itself signals that you’ve thought about skill development.
  • Evaluate your retention metrics through a dependency lens. High retention can mean high value or high lock-in. Ask: are users retained because the product makes them better, or because they can no longer function without it? The distinction matters for long-term product health and user trust.
  • Build a “transfer test” into your user research. Periodically ask long-term users to attempt their core task without AI assistance, then with it. Measure the gap. If the gap is growing over time, your scaffold has become a crutch.

If you’re thinking about AI collaboration in product teams more broadly, this scaffolding question is the flip side of the AI-as-coworker conversation — both are about how humans and AI systems should divide cognitive labor over time.

Building AI tools for learning, training, or skill development — and want to design for capability growth rather than dependency? I consult with product teams on AI-assisted learning design, scaffolding strategy, and building products that create durable user value. Let’s talk.

Barry Schwartz Was Right: Why AI Feature Creep Is a Choice Paradox Problem

Barry Schwartz’s research on choice overload — documented in The Paradox of Choice — showed that more options reliably make people less satisfied and less likely to choose. The famous jam study: a display with 24 varieties attracted more browsers but generated far fewer purchases than a display with 6. More options, less action.

AI has created a jam aisle with infinite varieties, and most product teams are treating it like a feature advantage. Every conversation about “AI capabilities” sounds like a menu planning session: look at everything we can do. The technology can theoretically do anything, so the assumption is that users want access to everything.

They don’t. And the products that understand this are going to win.

AI Feature Creep Is a Choice Paradox Problem

The pattern plays out the same way across AI product launches I’ve seen: a team builds a genuinely capable AI feature and launches it with broad capability — “ask it anything,” “summarize any document,” “generate whatever you need.” Usage is lower than expected. User feedback is confused. The team concludes the AI isn’t good enough and invests in improving the underlying model, when the real problem is the interface design choice to expose unlimited capability without opinionated defaults.

When I first started using GPT-4 for product work, I stared at the blank prompt box for longer than I’d like to admit. The capability was extraordinary. The interface was paralysis-inducing. Every time I sat down to use it, I first had to decide what to use it for — which was itself cognitive work that the tool didn’t help with.

Adding constraints made it more valuable, not less. “Analyze this data” got mediocre results. “Analyze this data for three specific insights about user retention that I might be missing” got genuinely useful outputs. The constraint reduced choice for the AI and for me, and the outputs improved dramatically.

The Curation Advantage Is Underexploited

Look at how the best AI product integrations actually work. Replit’s AI coding assistance doesn’t give you a generic AI chat interface. It offers contextually relevant, constrained actions: fix this bug, explain this function, write tests for this code. Each action is integrated into the development workflow rather than floating as a generic capability. The AI does more because it’s been given less to choose from.

Notion’s AI implementation shows the same principle: when you highlight text, it surfaces contextually relevant options based on content type rather than displaying all possible AI actions. The AI is making choices about what choices to offer, which is the genuinely hard product design problem.

Choice architecture research tells us users don’t want fewer options — they want intelligently structured ones. The goal isn’t to eliminate capability. It’s to make good default choices for users so they can focus cognitive resources on the work that matters rather than on deciding how to use the tool.

Three Design Principles for Fighting AI Choice Overload

Lead with workflow, not capability. Don’t present what the AI can do in the abstract. Present what users are trying to accomplish and show AI as the accelerant for specific, recognizable tasks. “Summarize this meeting” is a task. “AI-powered meeting assistant with transcription, action item extraction, sentiment analysis, and searchable archives” is a capability list that makes users wonder which part to try first.

Default to constrained, allow expansion. Ship with opinionated defaults that work for 80% of use cases. Make it possible to unlock broader capability, but don’t lead with it. Users who need the full flexibility will find it. Users who don’t will get value faster and with less friction. The majority of your users will never need the advanced settings — designing for them first is a mistake.

Let AI curate its own interfaces. The most sophisticated AI product design pattern: use AI to decide what choices to offer users, based on context. This is what the best implementations do — contextual, dynamic option presentation that reduces the meta-choice burden. You’re not hiding capability. You’re surfacing the right capability at the right moment.

The Counter-intuitive Competitive Advantage

In a market where every competitor is adding AI capabilities, the counter-intuitive competitive advantage is ruthless curation. The product that makes users feel capable and focused beats the product that makes users feel overwhelmed, even if the overwhelmed product has technically superior AI.

This is Schwartz’s insight applied to product strategy: the winning AI products won’t be the ones with the most features. They’ll be the ones that make the most confident choices on behalf of users, removing the cognitive burden of deciding how to use powerful tools. Those are design decisions, not model decisions — and they’re available to any team willing to make opinionated calls about what matters most for their users.

If you’re thinking about how AI feature design connects to user trust over time, the inversion framework is worth running on any AI feature launch: explicitly ask how unlimited capability exposure could destroy user confidence, and design against those failure modes.


Your Turn: Apply This Today

Your next sprint planning is an opportunity to say no more deliberately. Here’s how:

  • Conduct a “choice audit” on your core workflow. Walk through the main path a new user takes in your product. Count every decision they’re asked to make before they get to value. If it’s more than five, you have a choice overload problem to solve.
  • Apply the “brilliant friend” test to your AI features. Would a brilliant friend with your product’s capabilities offer 12 options and ask you to choose — or would they just give you the best answer? Redesign your AI features to behave like the friend, not the dropdown menu.
  • Eliminate one “power user” feature from your primary navigation this sprint. Identify a feature that 80% of your users never touch. Move it to an advanced settings section. Measure whether new user activation improves. It almost always does.
  • Rewrite your AI feature descriptions to emphasize what they decide for the user. Instead of “Choose from 8 AI writing styles,” try “We’ll match the right tone for your audience.” Confidence in the system builds trust. Choice signals uncertainty.
  • Set a “maximum options” standard for your product design system. Define explicitly: no UI element presents more than N choices at a time without progressive disclosure. Make it a design constraint, not a suggestion.
  • Interview users who churned in the evaluation phase. Ask them: “Was there a moment where you felt overwhelmed or unsure what to do next?” If you hear it more than twice, you’ve found the choice overload drop point. Fix that before you add another feature.

Building AI features and fighting the instinct to expose every capability you’ve built? I consult with product teams on AI product strategy, feature prioritization, and the design decisions that create durable user value. Let’s talk.

Drucker’s Knowledge Worker Paradox: Why AI Makes You More Productive and Less Effective

Peter Drucker wrote The Effective Executive in 1967, before knowledge work had fully displaced industrial work as the dominant mode of economic value creation. His core argument: effectiveness — producing the right results — is a discipline that can be learned. Efficiency — doing things faster — is a distraction if you’re doing the wrong things.

Seventy years later, AI is making Drucker’s distinction more urgent, not less. I’ve been running a 20-agent AI system for eight months. My output has never been higher — specs drafted faster, research synthesized in minutes, emails triaged and responded to before I’ve even looked at my inbox. My impact? That’s a different question, and a harder one to answer.

The AI Productivity Paradox

Here are the three traps I’ve observed across multiple organizations implementing AI tools, including my own teams:

The Tool Trap: Teams adopt AI without changing their work design. The result is faster generation of the same ineffective outputs. The company was already producing too many status updates, too many options without clear decision criteria, too many documents nobody reads. AI helps them produce more of that, faster. The work product volume goes up. The decisions don’t get better.

The Volume Trap: Individual contributors use AI to produce more — more analysis, more options, more documents. Their managers now have twice as much to review with no additional capacity for review. Decision quality decreases as cognitive load on decision-makers increases. The people who benefit most from AI productivity tools are often creating bottlenecks for the people above them.

The Speed Trap: AI enables rapid iteration, so teams iterate rapidly — on problems that aren’t well-defined. Speed without clear objectives doesn’t compress learning cycles. It just burns through resources faster on the wrong problems. I’ve watched teams use AI to generate ten variations of a product concept before anyone had agreed on what problem the product was solving.

What AI Does to Drucker’s Framework

Drucker defined knowledge workers as people whose tool is their own knowledge — whose productive output is the application of informed judgment to complex problems. The paradox: AI dramatically amplifies the volume of work knowledge workers can produce while doing almost nothing to improve the quality of their judgment.

I can draft a product strategy document in 20 minutes with AI assistance. That used to take four hours. The time savings are real. But strategy isn’t about document creation — it’s about making difficult choices with incomplete information, building conviction in a direction when reasonable people disagree, and committing resources before you have certainty. AI doesn’t do any of that. It helps me produce the artifact of strategic thinking faster, not the strategic thinking itself.

When I was at a consulting firm early in my career, I could generate competitive analyses, user journey maps, and market segmentation frameworks faster than anyone on the team. But I was optimizing for the wrong thing. The clients didn’t need more frameworks — they needed clearer decision-making. My AI-enabled productivity made me feel more valuable while I was actually less impactful than the partner who spent three hours in a room with a client asking uncomfortable questions with no slides in sight.

What Effective AI Use Actually Looks Like

The organizations getting AI right share a common pattern: they redesigned work around outcomes, not activities. They asked “what decisions do we need to make better?” before they asked “what tasks can AI help with?”

One product team I work with stopped using AI to generate more user personas. They use it to interrogate existing ones — to find the gaps, contradictions, and unstated assumptions in their current understanding of users. Same tool, completely different question. The AI generates more variance and sharper challenges to their existing beliefs rather than more artifacts that confirm them.

Another team uses AI specifically to reduce the volume of options they present to decision-makers — not increase it. They generate ten options, use AI to stress-test each one, and present three with clear trade-off analysis. Less volume, higher decision quality, faster leadership alignment.

The diagnostic question Drucker would apply is simple: Which decisions are better because of your AI use? Not “how much more am I producing?” but “where is my judgment actually improving, and where am I just producing more?” If you can’t name specific decisions that got better, you’re in the Tool Trap — generating more output without improving what matters.

The Manager’s Job in an AI-Enabled Team

Drucker argued that the manager’s job is to create the conditions for effective knowledge work — which means designing systems where people apply their judgment to problems worth solving, not just problems they can solve quickly.

In an AI-enabled team, that job gets harder before it gets easier. You have to resist the temptation to measure productivity by AI tool adoption rates, document generation speed, or throughput metrics that feel like evidence of progress. You have to build the clarity about outcomes that makes it possible to distinguish effective AI use from sophisticated busywork.

This connects directly to how you hire: the skills that matter most in AI-era product roles are exactly the ones Drucker would have valued — judgment, the ability to define the right problem, and the discipline to focus on outcomes rather than outputs. AI amplifies those skills in people who already have them. It amplifies the wrong behaviors in people who don’t.

Read Drucker’s foundational work on knowledge worker effectiveness with AI in mind and it lands differently than it did in 1967. The core problem he identified — confusing activity for effectiveness — is harder to see now, because the activity looks more impressive than ever.


Your Turn: Apply This Today

The productivity paradox is real. Here’s how to manage it intentionally rather than fall into it:

  • Audit your team’s AI-assisted work for effectiveness, not just speed. Pick three AI-assisted outputs from the last month. Ask: did these advance the right problem, or did they efficiently answer the wrong question? Speed on the wrong task is still waste.
  • Define your team’s “high-judgment” work explicitly. Make a list of the five to seven tasks in your product process that require the deepest human judgment and institutional knowledge. Protect those tasks from AI defaulting. Don’t let speed pressure push judgment work to the AI.
  • Build a “contribution mapping” practice. For each team member, map what they uniquely contribute that AI cannot — relationships, contextual judgment, pattern recognition from experience. Use this to structure their AI assistance: AI handles the mechanics, humans handle the judgment calls.
  • Set a “problem definition review” before every sprint. Spend 15 minutes at the start of each sprint asking: are we solving the right problem? AI makes execution so fast that teams skip problem definition. Don’t let the tool’s speed become your team’s blind spot.
  • Track “decision quality” as a leading indicator. Measure how often your team changes direction after shipping — a proxy for whether the initial problem definition was correct. If the rate is climbing while productivity is up, you have a Drucker paradox on your hands.
  • Create space for the “slow thinking” that AI cannot do. Block dedicated time in your team’s calendar — not for AI-assisted work, but for hard thinking without a prompt interface open. Strategic insight requires uninterrupted reflection. Protect it.

The organizational pattern that creates this paradox — shipping more without improving decisions — is the same thing driving the feature factory problem: speed without selection pressure.

Implementing AI tools across a product team and finding that productivity is up but outcomes aren’t? I consult with organizations on AI-enabled work design, team effectiveness, and building the management disciplines that turn AI capability into actual impact. Let’s talk.

Darwin’s Dangerous Idea and the Feature Factory Problem: What Evolution Teaches AI Product Managers

Most product managers approach AI like intelligent designers. We map the problem space, specify the solution, define the success metrics, and ship. We assume complex, useful behavior can be deliberately engineered from the top down.

Darwin’s core insight — that complex, purposeful-seeming systems can emerge without a central designer making deliberate choices — turns out to be more relevant to AI product development than most product leaders recognize. The products that have surprised their creators with unexpected value often got there through variation and selection, not specification. The ones that failed often did so because they were too precisely designed.

The Feature Factory Is an Intelligent Design Problem

The feature factory pattern — shipping features faster than users can absorb them, treating the roadmap as a delivery queue rather than a discovery process — is fundamentally an intelligent design failure. Teams act as if they can predict exactly what users need, specify it precisely, build it faithfully, and ship it to universal acclaim. The unpredictability of real users keeps surprising them.

Darwinian product development looks different: create variation, apply selection pressure, amplify what survives. Ship something with enough structure to be useful but enough flexibility to adapt. Watch what happens. Kill what fails fast, invest in what unexpectedly thrives. Repeat.

This isn’t a new idea — Eric Ries built Lean Startup around it. But AI makes it both more powerful and more necessary. More powerful because AI systems can generate variation at a scale humans can’t. More necessary because AI capabilities evolve faster than we can predict their applications, which means top-down specification misses most of the opportunity.

What Happened When I Let My AI System Evolve

I started building my multi-agent AI workflow the way most PMs approach feature development: specify what each agent should do, build it, deploy it. Email triage agent: sorts and prioritizes. Meeting summary agent: extracts decisions and action items. Research synthesis agent: connects relevant findings to current questions. Clean scoping, clear outputs.

That worked. But the interesting things happened when I stopped controlling the outputs so tightly.

The research synthesis agent started connecting ideas across domains I hadn’t linked — product insights from one industry informing decisions in another. The meeting summary agent began surfacing action items I hadn’t explicitly identified. The email triage agent started flagging opportunities I would have missed in a busy inbox.

None of that was specified. It emerged from iteration, from giving the agents broader parameters and selecting for what actually proved useful. The most valuable behaviors came from what looked like “mistakes” in the initial specification — outputs that weren’t what I asked for, but turned out to be better than what I asked for.

The Selection Pressure Problem in Product Organizations

Here’s where most organizations get stuck: they apply the wrong selection pressures to AI features.

Traditional product metrics — engagement, retention, feature adoption in the first 30 days — optimize for predictable behavior. They reward features that do exactly what users expect. In a stable environment, that’s fine. But AI capabilities evolve faster than user expectations, which means genuinely innovative AI features often fail traditional adoption metrics in their early stages.

I’ve watched product teams kill promising AI features because initial adoption was low, use cases were unclear, or users couldn’t articulate the value. Those are normal early-stage signals for genuinely novel capability — not kill signals. The teams that applied narrow selection pressure too early eliminated features that would have compounded in value as users developed new mental models for how to use them.

The right selection pressures for AI features look different: Are users who discover this feature coming back to it? Are they finding use cases we didn’t anticipate? Does it surface unexpected value even in early, imperfect form? These are survival signals, not lagging adoption metrics.

Practical Implications for AI Product Teams

Design for variation, not specification. The most useful AI features often emerge from systems that can adapt to individual users, not systems that deliver uniform experiences. Build in the variability deliberately. Let the system learn what works for different users and contexts rather than forcing one behavior on everyone.

Apply selection pressure at the right timescale. AI features that teach users new ways of working need longer evaluation windows than features that automate familiar workflows. Build in explicit “evolution periods” before you make kill-or-invest decisions.

Watch for emergent use cases. Your roadmap won’t predict the most valuable use cases for a genuinely novel AI feature. Set up the observation infrastructure to see what users do with it — not just whether they use it. Teresa Torres’ continuous discovery framework applies here: you need ongoing user contact to see the emergent behaviors, not just launch metrics.

Kill the feature factory framing entirely. If your product org treats the roadmap as a delivery queue, AI won’t change that pattern — it’ll accelerate it. You’ll ship more features faster and learn less from each one. The opportunity solution tree approach matters more, not less, when AI is involved.


Your Turn: Apply This Today

Break the feature factory pattern with these concrete interventions:

  • Audit your last ten shipped features for survival rate. Pull your release history from the past 6 months. For each feature, ask: is it still being used at the rate we hoped? If fewer than 30% are performing to expectation, you’re running a feature factory. Name it.
  • Introduce “feature deprecation” as a quarterly ritual. Every quarter, identify two to three features with low engagement and make a decision: improve them, deprecate them, or document why they’re worth keeping despite low usage. Add this to your roadmap review cadence.
  • Slow down before the next AI feature request. The next time a stakeholder asks for an AI feature, apply one filter before it goes on the roadmap: “What user outcome does this advance, and how will we know if it worked?” If no one can answer it cleanly, it’s not ready.
  • Measure “feature discovery” separately from “feature usage.” A feature that exists but isn’t found is a design problem. A feature that’s found but not used is a value problem. Distinguish them. They have different fixes.
  • Run a “natural selection” exercise on your backlog. Stack-rank your backlog not by business request priority but by the question: if we could only ship the features that directly advance our top user outcome, which ones survive? Cut everything below the line for the next sprint.
  • Establish an “outcome, not output” norm in sprint reviews. Require every feature to be presented alongside its success metric before it enters development — not after. If the team can’t define success in advance, the feature isn’t ready to build.

Running an AI product team and finding that standard product frameworks are breaking down? I consult with product organizations on AI product strategy, discovery processes, and building the organizational muscle to learn faster from AI features. Let’s talk.

Munger’s Inversion Principle Applied to AI Feature Development

Charlie Munger spent decades arguing that the most important question in any strategic decision isn’t “how do we succeed?” — it’s “how would this fail?” Inversion, he called it. Turn the problem upside down. Make an explicit list of failure modes. Avoid those things, and success becomes much more likely.

Most product teams building AI features are asking the wrong question. They ask “how do we add AI to increase engagement?” Munger would ask: “How would adding AI destroy user trust, tank our retention, and create a mess we can’t unwind?” The second question is more useful — and almost nobody is asking it systematically.

Why Inversion Matters More for AI Features Than Traditional Features

With traditional features, failure modes are relatively predictable: users don’t understand it, it introduces friction, it doesn’t address the right job-to-be-done. You can prototype your way to answers quickly.

AI feature failures are different in kind. The failure modes are often invisible until you’re already in them. The AI works technically but produces outputs that feel wrong to users. The feature solves the surface request but undermines a process users valued. The personalization creates a filter bubble. The automation saves time but removes something that was actually building skill.

Worse, AI failures compound. A bad AI feature doesn’t just fail — it creates skepticism about every subsequent AI feature you ship. Users who had one bad experience with AI-generated content take longer to trust the next one, even when the next one is significantly better. The trust decay has a multiplier effect that traditional feature failures don’t carry.

The Inversion Checklist for AI Feature Development

I’ve built three sets of inversion questions into our feature development process — one for concept, one for build, one for launch. They’re uncomfortable to answer, which is exactly the point.

Concept-stage inversion questions:

  • How might this AI feature make users less capable over time rather than more?
  • What process are we automating, and is that process actually doing work the user needs to be doing themselves?
  • Which users would this feature harm most, even if it helps the majority?
  • If the AI consistently produces outputs that are 80% right, what’s the cost of the 20% that’s wrong — and can users detect the difference?

Build-stage inversion questions:

  • How might this feature create behavior that looks like success in our metrics but represents user frustration in reality?
  • What would users need to do to recover from a bad AI output? Is that path clear?
  • How might this erode trust in the product if it degrades over time (model drift, data issues, changing content)?

Launch-stage inversion questions:

  • How might we accidentally promise capabilities we can’t deliver?
  • What could make our success metrics mask user frustration?
  • How might this AI feature create skepticism that makes the next AI feature harder to adopt?

A Real Example: When Inversion Saved Us

We had a proposal to use AI to automatically generate content suggestions for users who hadn’t engaged in a while. The business case was strong: reactivation opportunity, personalization at scale, low marginal cost per user. The team was excited.

The inversion exercise stopped us cold. The question that did it: “How might this AI feature make users feel about the product if the content suggestions feel generic or miss the context of why they actually disengaged?”

We started listing the failure modes. A user who left because they were overwhelmed getting AI-generated suggestions designed to re-engage them. A user who left for personal reasons feeling like the product is nagging them with automated content. A user whose specific context — grief, burnout, a season of life change — getting ignored in favor of algorithmic relevance scoring.

The feature wasn’t wrong. The framing was. We rebuilt it as a human-initiated workflow — making it easy for teams to reach out to churned users with personalized context — with AI supporting the human communication rather than replacing it. The outcomes were better and the failure modes were manageable.

The Compound Interest of Avoided Mistakes

Munger’s deeper point about inversion is that avoiding major mistakes compounds over time more powerfully than making brilliant decisions. Poor Charlie’s Almanack is full of this: the single biggest driver of Berkshire’s long-term performance wasn’t the great deals — it was the catastrophic deals they never made.

In AI product development, the same principle applies. A failed AI feature that destroys user trust doesn’t just cost you the feature — it costs you a percentage of every subsequent AI feature’s adoption. Trust damage compounds down. Trust built through reliable, well-scoped AI features compounds up.

The teams that will win at AI product development over the next five years aren’t the ones who ship the most AI features. They’re the ones who ship the right AI features and avoid the trust-destroying failures that make everything harder.

If you’re thinking about how AI features shape user behavior and capabilities over time, that question sits inside the bigger frame of what product decisions are actually for — which shapes which failure modes are even worth worrying about.


Your Turn: Apply This Today

Inversion is a discipline, not a one-time exercise. Here’s how to make it a habit in your AI product development process:

  • Start your next feature review with the failure question. Before discussing what the feature should do, ask: “What would make this feature actively harmful or useless?” Write down the top three answers before anyone advocates for it.
  • Run a “worst AI experience” brainstorm. Ask your team to describe the worst possible outcome if your AI feature misbehaves at scale. Hallucination? Bias? Confidence without accuracy? Design safeguards around those outcomes before you ship.
  • Audit your AI features for “confident wrongness.” Identify every place your AI presents an output with high confidence. Ask: what’s the failure mode if it’s wrong here? Add uncertainty signals or human checkpoints wherever the cost of confident wrongness is high.
  • Invert your user adoption assumptions. Instead of asking “why would users adopt this AI feature?”, ask “why would users refuse to use this AI feature?” Distrust, fear of being replaced, privacy concerns, and unreliable outputs are real blockers. Address them proactively.
  • Apply inversion to your OKRs. For each AI-related goal this quarter, write the inverted version: “We would fail this goal if ______.” Use the failure conditions to build leading indicators that catch problems before they become misses.
  • Build a “pre-mortem” into your AI feature launch checklist. Two weeks before every significant AI feature ships, hold a 30-minute pre-mortem. Assume it’s three months post-launch and the feature is being rolled back. Why? The answers become your launch criteria.

The inversion principle also applies to AI product scope: ask “how would exposing unlimited AI capability destroy user trust?” and you’ll arrive quickly at the choice paradox problem — which is its own failure mode worth mapping explicitly.

Building AI features and want a structured process for evaluating failure modes before you ship? I consult with product teams on AI product strategy, feature evaluation frameworks, and building durable user trust. Let’s talk.

Why Product Decisions Are Never Just Product Decisions

A few months ago, I sat in a product meeting where the team was debating whether to auto-generate personalized devotionals using AI. The product argument was compelling: large user base, clear demand signal, AI capability that could theoretically produce unlimited variations. Fast follow-up, easy A/B test, obvious engagement metric to optimize.

Then someone asked: “Should we?”

That question stopped the room. Not because it was unusual for a product team to ask whether they should build something — that’s good practice in any domain. But because everyone in the room understood that this decision was carrying weight that a standard feature evaluation framework wasn’t designed to handle.

Product decisions always carry more weight than our frameworks acknowledge. We just don’t usually name it.

Why Product Decisions Are Never Just Product Decisions

The standard product evaluation framework — desirability, feasibility, viability — asks whether users want it, whether we can build it, and whether it’s sustainable as a business. Those are necessary questions. They’re not sufficient ones.

Every product decision reflects a set of beliefs about what matters, what humans need, and what technology should and shouldn’t do. Most of the time those beliefs are implicit — held but unexamined. In consumer tech, that implicit belief system usually defaults to “engagement is good, more is better, friction is the enemy.”

That works fine for some products. It’s actively harmful for others.

When you’re building products at the intersection of technology and things people care deeply about — health, relationships, learning, meaning, faith — the implicit framework fails. The question isn’t whether your belief system shapes your product decisions. It does, whether you name it or not. The question is whether you’re examining it deliberately enough to build well.

The Three Product Tensions That Require More Than Standard Frameworks

Engagement vs. Depth: Engagement metrics optimize for return frequency and session length. But for products where the goal is genuine change — learning something, growing in a practice, developing a skill — those metrics can be proxies for the wrong thing. A user who comes back every day for two-minute interactions might be getting less value than one who comes weekly for a focused 30-minute session. If your success metrics are optimized for the former, you’ll build toward it even when the latter is actually better for users.

This is the same tension the best education product designers grapple with: it’s easier to measure engagement than learning, so teams end up optimizing for engagement and calling it a proxy for learning. Sometimes it is. Often it isn’t.

Automation vs. Agency: AI makes it technically possible to automate almost any information-delivery task. The product question is when automation serves users and when it substitutes for something they’d be better off doing themselves. A navigation app that gives you turn-by-turn directions is useful. One that removes your ability to navigate independently over time might not be, even if users prefer it in the short term.

The best product teams I’ve worked with ask not just “can AI do this?” but “what does the user lose if AI does this for them, and is that a trade-off worth making?” The answer isn’t always no — sometimes automation genuinely removes unnecessary friction. But the question should be asked explicitly.

Community vs. Isolation: Features that substitute for human connection — even when users prefer them — can create long-term harm that’s hard to trace back to any single product decision. Social platforms have learned this expensively. The insight was available earlier to anyone willing to ask whether the feature was facilitating human connection or replacing it.

A Framework for the Decisions Standard Frameworks Don’t Reach

When I’m evaluating product decisions in domains where the stakes are higher than a standard desirability/feasibility/viability analysis captures, I add three questions:

Does this build or diminish user capability over time? Not just “does the user like it?” but “is the user better off — more capable, more knowledgeable, more connected — as a result of using this feature?” This is the difference between tools that augment and features that create dependency.

What is this feature implicitly telling users about what matters? Products shape behavior, and behavior shapes belief. A news app that surfaces outrage-inducing content because it drives engagement isn’t neutral about what news is for. A social platform that optimizes follower count shapes users’ implicit model of what community means. Name the implicit message your feature is sending.

Who benefits when the user keeps using this, and does that align with what’s actually good for the user? Business model alignment matters here. Products funded by engagement advertising have structural incentives to maximize usage. Products funded by outcomes their users care about have structural incentives to deliver those outcomes. The incentive structure doesn’t determine the product, but it shapes the gravity you’re working against or with.

What This Looks Like in Practice

Back to the AI-generated devotionals debate. We ended up not building the feature — at least not the way originally proposed. The concern wasn’t technical. It was that the version we could build would likely optimize for content volume and engagement breadth rather than depth and reflection. We couldn’t measure “did this change how someone thinks about their faith?” but we could measure open rates and return visits. And we knew which one we’d end up optimizing for.

Instead we built structured study guides — AI-assisted, but designed around questions that required the user to do intellectual work rather than consume content passively. Harder to build, harder to measure, probably better for users.

This kind of decision doesn’t require faith to make. It requires a belief system — a set of convictions about what the product is actually for and what good outcomes look like that goes beyond usage metrics. Every team has one, implicitly. The teams that build things worth building usually have one explicitly.

This also reshapes how you use discovery frameworks. When you’re clear about what the product is for, opportunity solution trees become more useful — because you’re not just asking “what’s the most elegant solution?” but “what’s the most aligned solution we can actually build?”

If you’re building in a domain where this kind of product ethics question surfaces — health, education, relationships, meaning — it connects directly to the paywall question: what you choose to make free reflects your beliefs about what access to your product should mean. That framework applies here too.


Your Turn: Apply This Today

Product ethics isn’t a separate track — it’s built into how you make decisions. Here’s how to start integrating it:

  • Name the second-order effects of your next shipped feature. Before your next launch, run a 20-minute “impact mapping” session. Ask: who benefits from this feature? Who might be harmed, even unintentionally? What behaviors might this incentivize at scale?
  • Add “who bears the cost?” to your prioritization framework. For every item on your roadmap, ask explicitly: if this creates value for some users, who absorbs the trade-off? If the answer is always the same group, you have an equity problem in your product design.
  • Review your engagement metrics for manipulative patterns. Look at your most-used engagement features. Ask honestly: does this feature make users’ lives better, or does it just make them more active? There’s a difference. Know which one you’re optimizing.
  • Build a “values document” for your team. Define three to five explicit values that guide product decisions when the data is ambiguous. Reference them in sprint reviews. They make it harder to rationalize decisions that feel productive but cause harm.
  • Bring one ethics question to your next leadership review. Name a decision your team made in the last quarter that had an ethical dimension — and share how you reasoned through it. Normalizing the conversation is how you build organizational muscle.
  • Designate a “devil’s advocate” role in major feature reviews. Before shipping anything significant, assign one person to argue the case for the user, community, or market segment that could be negatively affected. Make it a standing agenda item.

Building products in high-stakes domains where standard frameworks don’t fully cover the decision space? I consult with organizations navigating product ethics, mission-aligned product strategy, and the gap between engagement metrics and actual user value. Let’s talk.

AI as Coworker: What Ethan Mollick Gets Right, and What I’ve Learned Running It at Scale

Ethan Mollick’s vision of AI as genuine coworker — not just a productivity tool, but an active collaborator that maintains context, remembers your preferences, and proactively contributes — is compelling and mostly right. I’ve been living it. I run a multi-agent AI system that handles significant portions of my executive workflow: prep briefs, research synthesis, pattern analysis across user data, first drafts of strategy documents. These agents know my preferences. They’ve learned my decision-making patterns. At the task level, the coworker framing is accurate.

But Mollick’s framing also glosses over the messiest part of AI collaboration at scale: the fact that your AI coworkers are only as good as the assumptions baked into how you built them, and auditing those assumptions is ongoing work that doesn’t get easier as you add more agents.

What the AI-as-Coworker Reality Looks Like at Scale

The parts of Mollick’s thesis that hold up: context persistence is genuinely transformative. When an AI agent knows I prefer morning strategy calls, that I need prep briefs 24 hours before board meetings, and my formatting preferences for different document types, it stops being a tool and starts being something closer to a capable assistant who has done the job before. The cognitive overhead reduction is real.

What his framing underweights: AI coworkers amplify the perspective of whoever designed them. My localization agent is excellent at translation mechanics and surface-level cultural adaptation — but it consistently over-indexes on the cultural frameworks most represented in its training data. It knows language but doesn’t always know context. The bias isn’t obvious; it shows up in subtle calibration errors that only become visible when you’re specifically looking for them, which most teams aren’t.

This isn’t a knock on the AI. It’s a structural reality: any agent is trained on a corpus that reflects some perspectives more than others. At scale, those systematic biases matter.

The Four Things Mollick’s AI Coworker Vision Gets Right

Context persistence is the unlock. The shift from “start every session from scratch” to “this agent knows my context” is more significant than any individual capability improvement. Persistent context is what makes AI feel like a coworker rather than a tool.

Proactive synthesis is genuinely useful. The best AI coworker behavior I’ve experienced isn’t answering questions — it’s surfacing patterns before I know to ask about them. An agent that watches your metrics and flags anomalies when you’re focused elsewhere is doing coworker-level work.

Specialization beats generalization. A general-purpose AI assistant is less useful than five purpose-built agents, each with specific context and constraints for its domain. Mollick’s research on AI collaboration points toward this, and it matches my operational experience.

The oversight burden is real and non-negotiable. Mollick is clear on this: AI coworkers require human judgment about when to trust and when to verify. You can’t abdicate that responsibility to the agent itself. This is right, and teams that try to fully automate judgment get burned.

The Three Things His Framing Misses

Cultural blind spots compound. An AI coworker trained primarily on majority-culture data will systematically underserve minority contexts. At the product level, this means recommendations that are technically sound but contextually wrong for segments of your user base. You need explicit cultural review processes, not the assumption that the AI will figure it out.

AI context gets stale like technical debt. Just like code, what your agents know needs maintenance. An agent that learned your priorities six months ago may be operating on outdated mental models. I schedule regular “context audits” — reviewing what each agent remembers, what it should forget, and what new patterns it needs to understand. This isn’t automated; it requires human judgment about what constitutes useful institutional memory versus obsolete assumptions.

Confidence calibration is a product problem, not just an individual one. The most dangerous AI coworker behavior isn’t being wrong — it’s being wrong confidently. I’ve built in explicit disagreement triggers: agents that surface alternative hypotheses when confidence intervals are wide, rather than defaulting to the most plausible-sounding answer. Training yourself and your team to expect this matters too.

What I’d Add to Mollick’s Framework

The coworker metaphor is useful but should be extended: the best AI coworker is one you’ve onboarded intentionally, given a specific domain, provided with representative context for your user base, and built explicit escalation paths into. The same care you’d give a strong new hire — clear context, defined scope, regular calibration — applies to your AI agents.

For hybrid decisions (anything affecting significant segments of your user base differently), I’ve built explicit frameworks that require both AI analysis and human review before action. The AI coworker runs the analysis; a human makes the call. That’s not distrust — it’s appropriate division of labor based on where each is actually good.

If you’re thinking about how AI changes the product manager’s job, this connects directly to the hiring question — because the PM skills that matter most in an AI-native team are exactly the ones that help humans and AI agents collaborate well rather than substituting one for the other.


Your Turn: Apply This Today

If you’re managing a team where AI is either underused or feared, here’s how to move the conversation forward:

  • Pick one high-friction workflow and run an AI sprint on it. Identify the task your team does every week that everyone quietly dreads — the status update, the competitive analysis, the draft brief. Spend one sprint running it with AI assistance and measure the time difference.
  • Set an “AI working agreement” with your team. Explicitly discuss: what outputs require human review before they go to stakeholders? What tasks can be AI-first? What should never be delegated to AI? Make it a team norm, not individual discretion.
  • Train on prompting, not just tools. The productivity gap between teams isn’t the tool — it’s prompting quality. Run a 30-minute session where team members share their most useful prompts. Document the best ones in a shared library.
  • Measure quality, not just speed. Track whether AI-assisted outputs require more or fewer rounds of revision than non-AI outputs. Speed gains that come with quality losses are not wins. Establish the baseline before you declare victory.
  • Interview your team about their actual AI use — not their reported use. Ask privately: “When did AI produce something you used directly? When did it produce something that misled you?” The honest answers will reshape your AI enablement strategy.
  • Protect the judgment-intensive work from AI defaulting. Identify the decisions in your product process that require nuanced human judgment — prioritization trade-offs, difficult stakeholder calls, ethical edge cases. Explicitly protect those from being AI-first. Delegation without boundaries creates accountability gaps.

For the strategic infrastructure question that underlies all of this, Jensen Huang’s sovereign AI argument is worth reading alongside — because how you build your AI stack determines what your AI coworkers can actually do.

Building AI-native workflows into your product team and running into the messy gaps between the vision and the reality? I consult with product leaders on AI system design, agent architecture, and the organizational changes that make AI collaboration actually work. Let’s talk.

How to Hire Product Managers for AI-Era Roles (Most Teams Are Testing the Wrong Things)

Most product management interviews are testing for skills that were valuable in 2019. We’ve updated our feature frameworks and adopted AI tools, but the way we evaluate and hire PMs hasn’t kept pace with what the role actually requires now. If you’re still leading primarily with “tell me about a time you used data to make a product decision,” you’re filtering for the wrong things.

Julie Zhuo has been pushing on this question — her writing on what it takes to be an effective PM in an AI-era organization is worth engaging seriously. My experience hiring PMs and building AI-native product workflows has led me to similar conclusions, though the specifics look different in practice.

Hiring for Tomorrow: The Mismatch Between What We Test and What We Need

Traditional PM interview loops test a reasonably consistent set of things: product sense through case studies, data reasoning through SQL or metrics questions, cross-functional influence through behavioral scenarios, and execution through “how would you prioritize this backlog” exercises. These aren’t bad signals. They’re just increasingly incomplete.

Here’s what I’ve started noticing in the roles where PMs succeed or struggle: the differentiating variable isn’t usually product sense or data fluency. It’s how well they navigate working with AI systems — not just using them as productivity tools, but understanding how AI-generated analysis should influence (and when it shouldn’t influence) decisions.

The PM who lets AI write their spec without interrogating its assumptions is dangerous at scale. So is the PM who reflexively distrusts AI outputs and manually redoes work that the tool handled well. The skill you actually need is calibrated judgment about when to trust and when to verify — and that’s almost never what we’re testing in interviews.

Three Skills That Matter More Than They Did Three Years Ago

AI Output Interrogation: Can this person look at AI-generated research, a summary, or a recommendation and identify where the model’s assumptions might diverge from your specific context? This isn’t about being a skeptic — it’s about understanding that LLMs optimize for plausibility, not accuracy, and that domain-specific nuance gets flattened. The best PMs I’ve worked with treat AI output the way a good editor treats a first draft: useful starting point, requires critical evaluation before it becomes a decision input.

Cognitive Flexibility Under Obsolescence: The half-life of specific product knowledge is shortening. A PM who was an expert in iOS growth mechanics in 2021 may have to significantly update that expertise by 2024. The question isn’t what someone knows — it’s how quickly they can update mental models when prior knowledge becomes wrong. I’ve started asking interview questions that specifically probe this: “Tell me about a time when you were confident in your understanding of something and then discovered you were significantly wrong. How did you handle that?” The candidates who light up on that question are the ones who will stay valuable as the landscape shifts.

Systems Thinking Across AI Dependencies: When AI handles a piece of the workflow, can this PM reason about downstream effects? If the recommendation engine surfaces different content because the underlying model was updated, can they trace how that affects engagement, retention, and revenue — and know whether to flag it as a problem or let it run? This kind of reasoning about interconnected systems is hard to teach and easy to screen out with narrow case studies.

What We’ve Changed in Our Hiring Process

We added an AI collaboration session to our interview loop — not to test technical prompting, but to watch how candidates navigate working with AI on a product problem. We give them access to a real AI tool and a realistic product challenge, then observe: Do they accept the first output? Do they probe it? Do they know when to override it? The output of the exercise matters less than the process we observe.

We’ve also deliberately hired from adjacent roles — UX research, growth marketing, technical writing — more than we used to. In some cases, people from these backgrounds had developed more sophisticated AI collaboration skills than traditional PMs who’d learned to use AI as a productivity hack rather than a reasoning partner.

Internally, we’ve built explicit AI fluency development into career progression. Expecting PMs to figure out AI collaboration on their own isn’t a strategy — it means the people who were already confident with AI get more capable while those who weren’t fall further behind. That creates brittleness you don’t want on a product team.

The Broader Implication

If hiring frameworks need updating, so do performance management systems and career ladders. The PM skills that earn promotions today should reflect the skills that create value now — which looks different than it did even three years ago. Most PM career frameworks I’ve seen still heavily weight traditional execution skills and underweight the judgment, systems thinking, and adaptive capability that matter most in AI-native product organizations.

This connects to broader questions about what deep customer knowledge actually requires in an era when AI can synthesize customer feedback at scale — knowing what AI is good at telling you versus what requires direct human observation is a fundamental PM skill now.

The teams that update their hiring criteria now will have better-calibrated product organizations in 18 months. The ones that keep running the same interview loop will wonder why their AI investments aren’t translating into better products.


Your Turn: Apply This Today

Whether you’re hiring now or building a hiring rubric for the future, use these to upgrade your process:

  • Redesign your take-home exercise around AI tools. Give candidates 48 hours to analyze a product problem — and explicitly tell them they may use any AI tools they want. Evaluate how they use AI, not just what they produce. Judgment about when to trust the output matters more than the output itself.
  • Add one AI judgment question to every interview. Ask: “Tell me about a time AI gave you a confident-sounding answer that turned out to be wrong. How did you catch it, and what did you do?” Strong candidates have a story. Weak candidates say it hasn’t happened yet.
  • Audit your current job description. Count how many skills on it are tasks AI can now do 80% as well in 10 minutes. If the list is long, you’re hiring for yesterday’s PM. Rewrite the description around judgment, communication, and synthesis.
  • Score candidates on “AI-assisted output quality.” Ask them to show you an analysis or document they created using AI tools. Evaluate the quality of their prompting, the critical review of the output, and the judgment calls they made to improve it.
  • Evaluate cross-functional communication explicitly. The most leveraged AI-era PMs translate between technical AI teams and non-technical stakeholders with precision. Test this: ask the candidate to explain a complex AI concept to a fictional non-technical exec. See if they can do it in 90 seconds without jargon.
  • Check their learning velocity, not just their current knowledge. AI capabilities change every 6 months. Ask candidates: “How do you stay current on AI developments relevant to product management?” If they don’t have a system, they won’t keep up.

Building out a product team and rethinking how to hire for AI-era PM skills? I consult with organizations on product leadership, team structure, and building AI-native product organizations. Let’s talk.

Jensen Huang’s Sovereign AI Argument Has a Product Leadership Translation

Jensen Huang’s “sovereign AI” argument is worth taking seriously beyond the geopolitical headline. His thesis — that every nation needs AI capability built on its own language, culture, and data, or it risks becoming dependent on systems that don’t reflect its values — has a direct parallel for product leaders. If you’re not thinking carefully about AI sovereignty at the product level, you’re making strategic decisions by default that you should be making deliberately.

I’ve been running a 20-agent AI system and managing a digital platform serving tens of millions of users across six continents. The sovereignty problem Huang describes for nations shows up constantly at product scale. Your AI stack’s assumptions shape your product’s outputs in ways most teams don’t audit until something breaks.

What “Sovereign AI” Actually Means for Product Leaders

Huang’s argument, made at multiple NVIDIA AI Summits, is that AI models trained primarily on English-language, Western data will systematically underserve populations whose language, culture, and context aren’t well-represented in the training corpus. Countries that rely entirely on US-built foundation models aren’t just outsourcinJensen Huang’s sovereign AI thesis translates directly to product strategy. Here’s what it means for your AI stack, data infrastructure, and build vs. buy decisions.g compute — they’re outsourcing the embedded assumptions that shape what the AI considers correct, helpful, or normal.

Scale this down to product level. If you’re building for a specific domain — healthcare, legal, ministry, education, financial services — and you’re relying entirely on general-purpose models, you’ve inherited all their assumptions about your domain. They may be fine. Or they may be systematically off in ways that compound over time.

The product that serves a global audience and runs entirely on a model calibrated for US contexts is making a sovereignty decision — just not consciously.

Three Levels of AI Sovereignty Every Product Team Should Evaluate

Level 1 — Infrastructure Sovereignty: Can you switch AI providers without rebuilding your product? Most teams are deeper in vendor lock-in than they realize. We built abstraction layers early — not because we expected to switch providers, but because provider pricing, capability, and availability change fast enough that flexibility has real value. Audit every model you call, every API you depend on, and every assumption you’ve made about continued access. The teams that did this in 2023 were better positioned when pricing structures changed in 2024.

Level 2 — Data Sovereignty: Is your AI improving from your users’ behavior, or just from the provider’s general corpus? There’s a meaningful difference between a model that knows your domain and a model that knows everything. We invested in data pipelines to support fine-tuning on domain-specific content — not because we’re training foundation models, but because domain-fine-tuned models consistently outperform general models on specialized tasks, and the infrastructure pays dividends across every AI use case we add.

Level 3 — Cultural Sovereignty: Does your AI’s output reflect the actual diversity of your user base? This is the hardest level and where most teams underinvest. A recommendation engine trained on aggregate user behavior will over-serve majority patterns and underserve minority ones. A content generation system trained on majority-culture examples will produce outputs that subtly don’t fit for users who aren’t in that majority. You need representative data and the conviction that your users’ diversity is worth the investment to serve well.

The Build vs. Buy Decision Has a New Variable: Control

Traditional build vs. buy analysis weighs cost, quality, and timeline. For AI, there’s a fourth variable that changes the calculus: control over outputs and the ability to course-correct.

When I evaluated building our own recommendation engine versus using existing models, the cost comparison was straightforward — custom build would have cost significantly more upfront. But the cost analysis didn’t capture what happens when the general model starts producing outputs that are technically correct but contextually wrong for your users. The cost of that misalignment — user trust erosion, editorial intervention, support load — is real and hard to quantify until you’re living it.

Sovereign capability means you can fix your own systems. Dependency means you wait for your vendor’s roadmap. Both are valid trade-offs at different scales, but it should be an explicit choice, not an accident of convenience.

What I’m Actually Doing Because of This Thinking

First, I audited our AI stack for dependency risk — every model we use, every API we call, every assumption about continued access. The exercise revealed more exposure than I’d estimated, particularly for edge cases where fallback models perform significantly worse than the primary.

Second, I’ve been investing in data infrastructure even before we need custom models. The pipeline work — better behavioral data collection, content taxonomy, user segmentation — pays dividends for every AI use case, whether we ever train custom models or not.

Third, I’ve shifted how we hire for AI product roles. I care less about people who can write clever prompts and more about people who understand how model assumptions propagate into product outputs. That’s a harder skill to assess in interviews, but it’s the one that actually matters at scale.

The teams building durable AI products will be the ones who thought about these questions before they became urgent. If you’re also thinking through the emerging faith tech category or how AI reshapes specialized product domains, the sovereignty question is front and center in all of it.


Your Turn: Apply This Today

The sovereign AI argument has direct implications for how you build and position your product. Here’s how to translate it:

  • Audit your AI dependencies. List every AI capability your product relies on — models, APIs, infrastructure. For each one, ask: if this provider doubled prices or shut off access tomorrow, what breaks? That’s your sovereignty risk profile.
  • Identify your organization’s proprietary data advantage. What data does your organization hold that a generic AI model cannot access? Structured, high-quality proprietary data is the foundation of a defensible AI product. Start building the strategy to use it.
  • Translate the infrastructure argument for your stakeholders. The next time you’re asked to justify AI investment, frame it not as a feature decision but as a strategic infrastructure decision. Infrastructure arguments win different budget conversations than feature arguments.
  • Build for control, not just capability. When evaluating AI integrations, score them on how much control your organization retains — over the model, the data, the outputs. Capability without control creates dependency.
  • Map the “AI talent concentration risk” in your team. If one or two people leave, do you lose the institutional knowledge of how your AI systems work? Start documenting AI system design decisions the same way you document architecture decisions.
  • Set a “build vs. buy vs. integrate” framework for every AI decision. Don’t default to buying API access. Evaluate each AI capability against your long-term control requirements and your organization’s strategic differentiation needs.

If you’re thinking about how your AI stack shapes your team’s workflows, the Mollick AI-as-coworker analysis covers the collaboration and oversight side of this same question.

Building AI-powered products and wrestling with vendor dependency, data strategy, or domain alignment? I consult with product leaders on AI product strategy and the trade-offs that don’t show up in vendor demos. Let’s talk.

Teresa Torres’ Opportunity Solution Trees Are Missing the Most Important Branch

Teresa Torres has been making the case that product teams should map their Opportunity Solution Trees before touching a backlog. Her framework connects desired customer outcomes to specific opportunities, then branches into potential solutions — all documented visually before any feature decision gets made. The idea is simple: stop jumping to solutions before you’ve fully explored the problem space.

I’m a convert. I’ve watched teams build elaborate features for problems that weren’t actually blocking users from their desired outcomes. The OST discipline of separating “what are users trying to do?” from “what should we build?” saves months of wasted effort. If you’re not using opportunity solution trees in your discovery process, you should be.

But after running a 20-agent AI system and managing product decisions across a platform with tens of millions of users, I think Torres’ framework is missing a critical branch: execution complexity.

The Problem with Pure Opportunity Mapping

Torres’ methodology assumes that once you’ve identified the right opportunity and mapped potential solutions, implementation is relatively straightforward — map the problem space, explore solutions, pick the best one, ship it.

That works beautifully for traditional product features. It breaks down when you’re building AI-powered products where solution complexity can vary by orders of magnitude for the same opportunity.

Here’s a real example. Users were struggling to find relevant content quickly — a clear, validated opportunity. Torres’ framework surfaced three potential solutions: improved search filters, AI-powered recommendations, or a full personalization engine. All three theoretically addressed the same opportunity. But the execution complexity was radically different:

  • Search filters: 2-week sprint with existing infrastructure
  • AI recommendations: 3-month project requiring ML infrastructure we didn’t have
  • Personalization engine: 12-18 months, new data team, significant architectural changes

The OST pointed toward the personalization engine as the most elegant solution. The execution reality made it the wrong choice — at least for that quarter.

The Execution Complexity Branch That’s Missing from Opportunity Solution Trees

Every opportunity solution tree needs a fourth assessment alongside feasibility, viability, and usability: execution complexity. Not just “can we build this?” but a structured evaluation of three dimensions:

Technical Complexity: What infrastructure, integrations, or architectural changes does this require? Does it introduce new dependencies that create fragility? AI solutions in particular can look simple on the surface but require entire data pipelines you haven’t built yet.

Organizational Capacity: Which specific people would work on this, and what are they not working on instead? “We have engineers” isn’t capacity analysis. You need to know if your ML team is already maxed out before you commit to an AI-heavy solution path.

Market Timing: How does this solution’s build timeline map to competitive pressure and user expectation shifts? The best solution shipped six months late is often worse than a good-enough solution shipped next sprint.

How This Changed a Real Decision

Back to the content discovery problem. The pure opportunity map pointed toward an AI recommendation engine. When I added the execution complexity branch, the picture shifted completely:

The recommendation engine required user behavior tracking we hadn’t built, content taxonomy infrastructure we didn’t have, and ML capacity that was already committed elsewhere for six months. Meanwhile, our data was showing users weren’t churning because of discovery problems — they were churning because of onboarding friction. That was a completely different branch of the opportunity tree we’d underweighted.

We shipped improved search filters in two weeks. Immediate impact on discovery success rates. ML capacity freed up for onboarding optimization. Six months later, we had the infrastructure foundation to revisit the recommendation engine with real behavioral data instead of assumptions.

Pure opportunity mapping would have sent us toward the most sophisticated solution. Adding execution complexity sent us toward the most effective one.

What I’ve Changed in Our Process

Every opportunity solution tree we build now includes an implementation branch with three required assessments before any solution advances to prioritization:

  • Build vs. integrate: Can we solve this with existing tools and APIs, or does it require custom development?
  • Team capacity reality: Named individuals, not abstract “team bandwidth”
  • Dependency mapping: What other systems, teams, or external factors does this solution require before we can ship?

We also changed how we score solutions in prioritization. Instead of just evaluating opportunity impact, we multiply by an execution confidence score — a team-assessed number from 1-5 representing how clear and achievable the implementation path is right now, with current resources. High-confidence solutions get a boost. Novel solutions requiring infrastructure we don’t have get penalized, even if the opportunity mapping scores them well.

This has made our roadmap conversations significantly more honest. Teams stop selling “what would be most impressive” and start evaluating “what can we actually ship that moves the metric.” Those are different conversations.

Teresa Torres Is Still Right — Just Incomplete

None of this is an argument against opportunity solution trees. The framework is genuinely one of the most useful tools in product discovery, and Torres’ writing on continuous discovery should be required reading for any PM team.

The gap I’m pointing to is specifically about AI-era product development, where the distance between “this is the right solution” and “we can actually build this right now” has grown dramatically. AI solutions are rarely simple integrations. They carry data requirements, infrastructure dependencies, and organizational capabilities that need to be assessed as part of the tree — not after you’ve already committed to the direction.

Add the execution complexity branch. Your opportunity solution trees will be more honest and your roadmaps more achievable. If you’re already using deep customer knowledge to validate your opportunity space, adding execution complexity assessment closes the loop between discovery and delivery.


Your Turn: Apply This Today

Before your next opportunity solution tree session, add execution complexity as a first-class branch:

  • Draw your OST with four branches, not three. Add “execution complexity” alongside desirability, viability, and feasibility. For each solution, map at least two execution risks: what could make this take 3x longer than estimated?
  • Run a pre-mortem on your highest-priority experiment. Before you commit, ask the team: “Assume this experiment fails. What went wrong?” Write down the top three answers. Those are your execution risks. Address them in the experiment design.
  • Estimate the “integration tax” on every proposed solution. How many other teams, systems, or processes does this solution touch? Every dependency adds execution time and coordination cost. Factor it in before you put a solution in a sprint.
  • Create a “complexity score” for your backlog. Add a simple 1-5 complexity rating to every item: 1 = completely self-contained, 5 = touches 5+ systems or teams. Use it to balance your sprint with high-complexity and low-complexity items.
  • Separate “solution discovery” from “execution planning” in your workflow. The OST is for discovery. Before any solution moves to execution, it needs a lightweight technical scoping session. Don’t skip the bridge between the two.
  • Review your last three delayed projects and name the execution complexity that caused the delay. Was it dependencies, scope creep, unclear ownership, or something else? Use that pattern to inform how you scope the next one.

The execution complexity question is tactical, but it sits inside a bigger frame: what your product decisions are actually for — which shapes which trade-offs are even worth making.

Running product discovery at scale and hitting the gap between good frameworks and messy execution reality? I consult with product teams navigating complex build decisions, AI integration strategy, and discovery-to-delivery alignment. Let’s talk.

Why ‘Faith Tech’ Is About to Become a Real Category

I’ve spent fifteen years building faith tech — products for pastors, missionaries, children’s ministry leaders, and everyday believers across four continents. I’ve watched this space evolve from the inside. And here’s what I see coming: “faith tech” is about to stop being a vibe and start being a category.

Venture capital is starting to notice. Conferences are forming around it. Christian tech workers are organizing. But we don’t have shared vocabulary yet. No agreed-upon market map. No anchoring fund defining the category. That’s about to change — and the catalyst is AI.

What the Faith Tech Category Actually Includes

Faith tech encompasses products and platforms that facilitate spiritual formation, religious practice, church operations, or faith-based community building. The landscape is already substantial, though fragmented.

Bible engagement: YouVersion claims over 600 million installs. These aren’t niche products — they’re among the most-used apps on the planet. Sermon preparation: SermonCentral has served pastors for over two decades. Newer entrants like SermonAI and Pulpit AI are using machine learning for research and structure. Pastors are more open to AI assistance than most people expected. Church management: Planning Center handles volunteer scheduling, online giving, and more for thousands of churches. Pushpay and Tithely process significant donation volumes. Children’s ministry: Spark & Cannon Kids, Orange, and others serve curriculum across denominational lines. Discipleship and training: RightNow Media operates as “the Netflix for churches.” ORI builds personalized learning pathways for spiritual formation.

The market fundamentals are stronger than most people realize. There are an estimated 380,000 churches in the United States alone (Hartford Institute for Religion Research). Globally, estimates suggest over 5 million congregations. Mid-size congregations are spending real budget on tools — not just megachurches.

Why AI Is the Faith Tech Category Catalyst

Every category-defining moment needs a catalyst. For faith tech, it’s AI — and it doesn’t just make existing tools better. It makes entirely new categories possible.

Personalized discipleship at scale. Instead of one-size-fits-all reading plans, we can now build adaptive pathways that adjust based on engagement patterns, life circumstances, and spiritual growth indicators. I’ve been prototyping this with my CONSILIUM system — applying the same autoresearch patterns that optimize executive workflow to spiritual formation contexts.

Intelligent content curation for pastors. Pastors spend significant time on sermon preparation. AI can surface relevant commentaries, cross-references, and cultural context without replacing the pastoral heart of the message. AI excels at research aggregation. It doesn’t replace theological interpretation — and shouldn’t try. This is exactly the strategic question every sermon prep platform is facing right now.

Multilingual ministry tools. AI translation is expanding access to spiritual resources in ways that previously required teams of translators. Theological nuance is the hard part — but the technical barrier has dropped dramatically.

Accessible economics. AI makes it economically feasible to build sophisticated tools for markets that couldn’t justify the development cost before. A 200-member church can now access technology quality that previously required megachurch budgets.

The Risk: Building Faith Tech Without Understanding Ministry

Here’s where this gets interesting — and where most attempts will fail.

The edtech industry spent billions building products for classrooms without understanding how teachers actually work. Faith tech faces the same risk. I’ve watched well-intentioned founders build “AI sermon assistants” that miss how pastors actually prepare messages. Or “church growth platforms” that optimize for metrics unrelated to spiritual health.

The pattern I keep seeing: technologists who attended church as kids assume they understand ministry operations. They often don’t. Ministry involves relationship-intensive dynamics that don’t translate easily to software frameworks. Church leadership involves theological discernment that can’t be automated. Spiritual formation happens through community, not just content consumption.

The products that succeed will be built by teams that include former pastors who understand church operations, ministry leaders who’ve lived with budget constraints and volunteer coordination, missionaries who’ve navigated cross-cultural discipleship. Technical excellence without ministry context produces elegant solutions to problems churches don’t actually have.

Why the Faith Tech Category Is Forming Right Now

Three trends are converging:

Post-pandemic digital adoption. COVID forced every church to become a technology organization overnight. Pastors who had never used video conferencing suddenly became experts in livestreaming, online giving, and digital discipleship. That technological fluency is persistent — churches aren’t going back.

Generational leadership transition. Millennials and Gen Z are moving into senior ministry roles. They expect technology to work seamlessly and are willing to pay for tools that save time and improve outcomes.

AI accessibility. Large language models have dropped the technical barrier for building intelligent applications dramatically. A solo developer can now build AI-powered ministry tools that would have required a full engineering team previously.

We Need More Product People From Ministry

The faith tech category will get built. The question is whether it gets built by people who understand ministry or by people who see churches as an underserved market opportunity.

We need more product managers, designers, and engineers who’ve served in ministry roles building for the church they know. Not just people who attended church growing up — people who’ve planned worship services, coordinated volunteers, prepared sermons, led small groups, and navigated denominational politics.

The best faith tech products come from founders who’ve experienced the problems they’re solving. In faith tech, that principle isn’t just good advice — it’s the difference between a product that gets used and one that collects digital dust on the church server.

The category is forming. The market is substantial. The technology is ready. The question is who will build it — and whether they’ll understand why they’re building it.


Your Turn: Apply This Today

If you’re building in or adjacent to faith tech — or advising someone who is — here’s where to focus your thinking:

  • Define your market with precision. “Faith tech” is not a market — it’s a category. Identify your specific segment: churches, individual believers, clergy, faith-based nonprofits, publishers? The more specific your user, the more defensible your product.
  • Map the decision-maker vs. the user. In most faith tech, the person who chooses the product (church admin, pastor, IT volunteer) is not the person who uses it daily (congregation member, volunteer). Design for both. Build trust with the decision-maker, value for the user.
  • Identify the secular analogue and then name what’s different. Almost every faith tech product has a secular equivalent. ChMS is CRM. Sermon tools are content platforms. Name what your product does differently because the mission context demands it. That’s your differentiation story.
  • Talk to three organizations running on outdated systems. The highest-value faith tech opportunities are in organizations still running on spreadsheets, paper, or 2008-era software. Find them. Understand why they haven’t switched. The answer is usually trust, not budget.
  • Set a “mission metrics” framework. What does success look like in this market beyond revenue? Engagement, spiritual formation, volunteer retention? Define it early. The organizations that pay premium prices in this market do so because they believe the product advances their mission.

Building in the faith tech space? I’ve spent 15 years at the intersection of ministry, product, and technology. Let’s talk.

The First 100 Days: What Happens When a Product Leader Joins a 200-Year-Old Publisher

I switched from leading product at SermonCentral — 14,700 subscribers, two-week ship cycles — to a product serving 23 million users at a publisher founded in 1817. The contrast hit me on Day 3 when I wanted to update a single line of copy.

At SermonCentral, I’d open VS Code, push the change, and watch it go live. At my new role, that same change required stakeholder alignment across three departments, QA testing, and a deployment window I hadn’t even learned to schedule yet.

This is what the first 100 days of product leadership looks like inside a legacy institution — and it’s nothing like what I expected.

The Codebase Carries 20 Years of Decisions You Didn’t Make

When the first line of code for a product launches in 1993, you’re inheriting two decades of product decisions baked into the architecture. Database schemas that made perfect sense in 2004. API endpoints that were cutting-edge in 2010. Frontend frameworks that were the right choice when they were implemented — and still work fine, which is exactly why they haven’t been replaced.

At SermonCentral, when something felt clunky, I knew exactly why. I’d built it, or my co-founder had. Here, I was an archaeologist. Every feature had a story I didn’t know yet.

The temptation is to immediately start proposing rewrites. What I learned: listen first. The person who built that “outdated” search feature probably solved problems I hadn’t discovered yet. I spent my first 60 days in the codebase like I was studying someone else’s sermon notes — not to critique, but to understand the thinking behind each decision.

Stakeholder Alignment Takes 3x Longer (And That’s Actually Good)

At a startup, product decisions happen fast because the stakes are binary: ship something users love, or run out of money. At a product with 23 million users, the stakes are different. Break the wrong thing and you’re not just losing customers — you’re disrupting someone’s daily devotional time, interfering with a pastor’s sermon prep on Thursday night, affecting a small group leader in rural Kenya who depends on reliable access to Scripture.

A feature that would have been a 30-minute conversation at SermonCentral became a series of meetings involving product, engineering, content, legal, and partnership teams. Not because people were slow or bureaucratic — because the impact radius was massive.

Three months ago, I would have seen this as friction. Now I see it as risk management. The process is longer, but the decisions are better. I just had to recalibrate what “move fast” means when millions of people depend on you not breaking something they rely on spiritually.

Three Things That Surprised Me in the First 100 Days of Product Leadership

The depth of the data. The behavioral signals most startups dream about were already there. Not just what people read, but how they read it — chapter-by-chapter completion rates, cross-reference click patterns, search query analysis going back years across multiple Bible translations, languages, and device types. This wasn’t a Google Analytics dashboard. It was instrumented user behavior data that would make most SaaS companies jealous.

The team was already building what I thought we needed. I came in assuming I’d need to evangelize for modern product practices — user research, A/B testing, data-driven feature prioritization. Turns out, the team was already doing all of this. They just weren’t talking about it in Silicon Valley buzzwords. I thought I was joining a legacy institution that needed digital transformation. Instead, I found a digitally sophisticated team that needed someone to help coordinate and scale what they were already building.

The urgency is real. When people hear “200-year-old publisher,” they assume slow-moving and comfortable. That’s not what I found. Digital isn’t optional when you’re competing with products that have billions of installs. The urgency here is channeled through process, not around it. At a startup, urgency means cutting corners. Here, urgency means making the process faster and more effective.

What I’d Tell Any Product Leader Joining a Legacy Institution

Listen for 60 days. I thought I knew what the product needed after my first week. I was wrong about almost everything. The problems I identified weren’t actually problems — they were features working as intended for use cases I hadn’t discovered yet. Spend two months learning the system before you start trying to change it. Talk to everyone. Read the old product specs. Understand not just what the product does, but why it was built that way.

Ship something small by Day 90. Listening is important, but you also need to prove you can execute in their environment. Find something genuinely small — a bug fix, a copy improvement, a minor UX enhancement — and ship it. My first shipped feature was a tiny improvement to a comparison tool. It took three weeks to get live (compared to three hours at SermonCentral), but I learned more about the system from that one change than from a month of meetings.

Earn trust before proposing transformation. Legacy institutions have heard transformation pitches before — many from consultants who didn’t stick around to see the results. Your credibility comes from understanding what’s already working and why. The most effective change agents I’ve seen inside legacy organizations aren’t the ones who come in with dramatic transformation roadmaps. They’re the ones who make the existing system work better, then gradually expand the definition of “better” over time.

The Long Game

Building inside legacy institutions means accepting that transformation happens in years, not quarters. Your first job is to understand what’s already working before you start changing what isn’t. This is true whether you’re joining a publisher, a denomination, a hospital system, or any organization built to outlast its founders.

But it also means you’re building on a foundation tested by millions of users over decades. When you do ship something new, it immediately inherits that scale and stability. That’s a trade-off I didn’t expect to appreciate — but four months in, I do.

The question isn’t whether legacy organizations can innovate. The question is whether product leaders can learn to innovate within systems designed to last longer than most startups exist. From what I can tell so far, the answer is yes — but only if you’re willing to measure success differently than you were trained to.


Your Turn: Apply This Today

Whether you’re in your first week or month 18, these disciplines separate product leaders who earn trust from those who burn goodwill:

  • Schedule a “context download” with every key stakeholder in the first 30 days. Ask them one question: “What has every product leader before me gotten wrong?” Listen without defending. Take notes. Come back in 60 days to report what you’ve done about it.
  • Map the informal power structure before you touch the org chart. Identify who the real decision-makers are — the people others defer to in meetings even when they don’t have the title. Build those relationships before you need them.
  • Find one quick win in the first 45 days — and be transparent about why you chose it. Explain publicly that you picked it because it demonstrates capability and builds trust, not because it’s the most important thing. This earns more credibility than the win itself.
  • Audit your predecessor’s decisions charitably. Before you change anything, write a one-paragraph explanation of why the previous decision made sense in its context. Share it with your team. It signals maturity and protects morale.
  • Don’t reorganize anything in the first 90 days. No matter how obvious the fix looks. Reorgs before trust is established signal insecurity, not leadership. Earn the mandate first.
  • Set explicit expectations about your decision-making rhythm. Tell your team how you make decisions, what you delegate completely, and what you always want to be consulted on. Ambiguity on this costs you more goodwill than any single wrong decision.

Navigating a product leadership transition — startup to enterprise, or joining a ministry organization with legacy tech? This is exactly the kind of work I help leaders think through. Let’s talk.

Deep Customer Knowledge: The PM’s Field Guide to Knowing Your User Better Than They Know Themselves

Deep customer knowledge is the single most important skill in product management — and the most commonly faked. I say that as someone who once inherited 18 user research studies spanning 13 years and still nearly made the wrong product decision because of it.

Here’s what happened. A usability test asked 5 users to try the Hebrew/Greek word study tools. Zero out of 5 said they’d use them. The researcher’s recommendation: deprioritize original language features. But the behavioral data told a completely different story. The reverse interlinear — the exact tool those 5 users dismissed — was the single most-used resource among paying subscribers. The feature driving the most revenue was the one the research said nobody wanted.

That’s the gap between having customer data and having deep customer knowledge. In my post on 25 PM Skills for 2026, I listed deep customer knowledge as skill #1. Here’s the full picture.

What Deep Customer Knowledge Actually Is

Deep customer knowledge isn’t reading your NPS score. It isn’t scanning the top 10 support tickets. It isn’t even running a survey.

It’s the ability to predict what your customers will do — not just what they’ll say — in situations you haven’t tested yet.

Marty Cagan puts it directly: the product manager needs to be the “acknowledged expert on the customer.” Not the research team. Not the data analyst. The PM. Deep knowledge of their issues, pains, desires, how they think, how they work, and how they decide to buy. Without it, you’re just guessing.

Most PMs can rattle off their user personas. They can tell you the average age, the top feature requests, the churn rate. That’s customer data. It’s necessary. It’s not sufficient.

Deep customer knowledge is the specific, surprising detail that changes how you build — knowing that 27% of your churn is involuntary (credit card failures, not dissatisfaction), and that the user who doesn’t renew isn’t unhappy, they just forgot your product existed for three weeks. One number goes in a report. The other rewrites your retention roadmap.

Three Ways PMs Fail at Deep Customer Knowledge

Failure mode 1: Outsourcing understanding. You hire a research firm, they run a study, hand you a deck. You read the executive summary, quote two stats in your next planning meeting, and call yourself customer-obsessed. The deck goes in a folder. The understanding stays surface-level.

Failure mode 2: Confusing asking with knowing. Surveys tell you what people say they want. Behavioral data tells you what they actually do. These diverge constantly. The Hebrew/Greek example is the poster child — five users in a room said “no thanks.” Fifty thousand subscribers in the wild said “this is why I pay.”

Failure mode 3: Building knowledge once. Your customer isn’t static. The median age of one subscriber base I tracked increased by 2 full years in just 3 years (58 to 60). The same persona document from 2019 would actively mislead you in 2026. The most dangerous sentence in product management: “We already know our customers — we’ve been in this market for years.” Tenure creates confidence. Confidence stops you from looking.

The Knowledge Stack: Five Layers of Customer Understanding

I think about customer knowledge as a stack. Five layers, each harder to get but more valuable than the last.

Layer 1: Demographics. Who they are — age, location, role, income. You get this from surveys and analytics. Almost every PM has this. It tells you very little about what to build.

Layer 2: Behavior. What they do — feature usage, session patterns, search queries, purchase triggers. You get this from analytics and instrumented data. Most PMs have some of this. It tells you what’s working but not why.

Layer 3: Motivation. Why they do it — the job they’re hiring your product for. You get this from interviews and contextual inquiry. Teresa Torres calls this the “opportunity space” — the unmet needs, pain points, and desires that drive behavior. Her framework in Continuous Discovery Habits insists on weekly interviews where you collect stories, not opinions.

Layer 4: Context. What surrounds the decision — their constraints, their alternatives, their emotional state when they open your product. You get this from spending time with users in their actual environment. Rob Fitzpatrick’s The Mom Test is the best guide here.

Layer 5: Contradictions. Where what they say and what they do diverge. You get this by holding Layers 2, 3, and 4 in tension. This is where the real product insights live. The feature nobody asks for but everybody uses. The problem they describe incorrectly but feel intensely.

Most PMs operate at Layers 1–2. The best PMs live at Layers 3–5.

What It Looks Like in Practice

When I walked into a new role with those 18 research studies going back to 2012, instead of reading executive summaries, I indexed every finding — 35 discrete, citable insights with evidence levels and source documents. Within a week, I could trace any claim about our users back to the specific study, sample size, and date.

What emerged wasn’t in any single study. It was in the pattern across all of them. The #1 cancel reason for 7 straight years was “I didn’t use it” — not “too expensive,” not “missing features.” Just didn’t use it. Forty to 56% of non-subscribers didn’t know the premium tier existed. The growth problem wasn’t conversion. It was awareness. The user base was aging, but the most convertible non-subscriber segment was 25–39 year olds.

None of these were secrets. They were all documented. But nobody had held them in tension before. The research existed. The deep customer knowledge didn’t — until someone synthesized it.

How to Build This Skill (Starting This Week)

Week 1: Build your Knowledge Stack audit. For your product right now, what do you know at each layer? Write it down. Most PMs discover they’re strong at Layers 1–2 and almost empty at Layers 3–5.

Week 2: Read the raw data, not the summary. Pull the last 3 customer research reports. Don’t read the executive summaries. Read the verbatims — the actual words customers used. The patterns you spot in raw language are different from the patterns a researcher highlighted.

Week 3: Start a contradiction log. Every time you find a gap between what users say and what they do, write it down. Over a month, you’ll have 5–10 entries. Each one is a potential product insight.

Week 4: Talk to one customer who left. Not a satisfaction survey — a conversation. Ask them to walk you through the last week before they canceled. The story will tell you more than the rating ever could.

The PMs who build the best products aren’t the ones with the most customer data. They’re the ones who can tell you where the data contradicts itself — and what that contradiction means. That’s the skill. That’s what separates customer-informed from customer-obsessed.


Your Turn: Apply This Today

Deep customer knowledge isn’t built in one session — it’s built through consistent practice. Start here:

  • Book one customer visit this month — not a Zoom, a visit. Go to where your user actually uses your product. Watch them in their environment. You will learn something in the first five minutes that no survey could surface.
  • Create a “customer reality file.” Start a running document (not a Confluence page no one reads — a doc you actually open weekly) capturing direct quotes, observed behaviors, and surprising context from every customer interaction.
  • Interview a churned user before your next roadmap session. Ask one question: “What were you hoping we’d become that we never did?” Their answer will reorganize your priorities faster than any NPS report.
  • Spend two hours in your support queue. Read the last 50 support tickets without filtering by category. Look for the emotion in the language — frustration, confusion, apology. That’s your product’s actual reputation.
  • Map the user’s day, not just your product’s workflow. Draw a timeline of a typical user’s workday. Mark where your product shows up. Notice what surrounds it — what did they do before, and what do they need to do immediately after?
  • Challenge your next assumption in the open. In your next sprint planning or roadmap review, say out loud: “I believe [user assumption] — here’s what would have to be true for me to be wrong.” Invite pushback. Build the habit of treating assumptions as hypotheses.

Need help building a customer research system that actually informs product decisions? I work with product teams at ministry organizations and faith-tech companies. Let’s talk.

What Should Be Free? The Framework I Use to Draw the Paywall Line

I made the wrong paywall call once. We gated a feature that a significant portion of our free users relied on daily — not because the revenue math demanded it, but because “premium users should get more.” Six weeks later, support tickets had doubled, free-to-paid conversion had flatlined, and we’d punished the most engaged segment of our user base. We reversed the decision. It cost us three months and a lot of trust.

The freemium paywall framework I use today came out of that mistake. Here’s how I think about what should be free — and why getting this wrong kills mission-driven products faster than anything else.

The Wrong Question Behind Most Paywall Decisions

Most freemium decisions start from the wrong end. The instinct is reasonable: if you’re building a paid tier, paying users should get more. But when that logic drives every decision, you end up asking “what should go behind the paywall?” instead of “what must stay free?”

That inversion matters more than it sounds. When you start from the free side, you’re forced to defend every gate. When you start from the paid side, you’re incentivized to move things behind walls because it feels like value creation. In my experience, it usually isn’t.

The freemium products that last built genuinely useful free tiers first, then built paid tiers that made useful things faster, deeper, or scalable. The free tier wasn’t a demo. It was a product. If your free tier is deliberately limited to drive upgrades, you’re not running a freemium model — you’re running a time-limited trial. Those are different businesses, and they attract different users with different expectations.

Two Jobs, One Product

The framework starts with a question that sounds simple and isn’t: what job is the user trying to do?

In most digital products — especially mission-driven ones — at least two distinct user jobs show up at the same front door. The personal user and the professional. The student and the practitioner. The occasional reader and the daily power user. These aren’t tiers of the same job. They’re different jobs entirely.

Conflating them into a single pricing decision produces a product that serves neither well — and a paywall that frustrates the users most likely to become your strongest advocates.

Once you’ve mapped the jobs clearly, the paywall question becomes cleaner. For each feature, you’re really asking: does this serve the user who can’t pay, the user who won’t pay, or the user who pays because the tool is genuinely worth it?

Can’t pay: Gate this, and you’ve excluded someone with no alternative. If your product serves a global audience, this is a mission failure disguised as a pricing decision. Won’t pay: Gate this, and you’ll generate complaints with minimal conversions. The feature isn’t their tipping point. Pays because the tool is worth it: Gate this correctly and you have a sustainable model.

The Four Paywall Tests

For every feature — new or existing — I run four tests in order. The order matters.

Test 1: Access. Does gating this feature prevent someone from doing the core job they came here to do? If yes, it stays free. Not “probably free” — free, as a hard constraint. A product that charges for its core value has confused its business model with its purpose.

Test 2: Job. Is the primary user of this feature a personal user or a professional? Personal = free. Professional = paid. Students are a gray zone — check your analytics to see who actually uses it week over week. The data is almost always less ambiguous than the internal debate.

Test 3: Majority World. Would someone with limited income — a volunteer, a practitioner in a lower-income country, a student without institutional backing — need this feature to accomplish meaningful work? If yes, it belongs in the free tier or needs a genuine no-cost path. Scholarship programs, geographic pricing, and educator access are all legitimate answers. “We’ll figure that out later” is not.

Test 4: Revenue sufficiency. Is there a paying segment large enough and motivated enough that this feature can sustain itself economically? A feature behind a paywall that generates no conversions is worse than a free feature. It signals to free users that you’re extracting value while delivering no business result. As OpenView’s SaaS benchmarks consistently show, freemium-to-paid conversion depends heavily on the perceived value gap — not the friction gap.

The Ideology Under the Framework

The four tests are pragmatic. But there’s an ideology underneath them that determines how you handle the hard cases — and in mission-driven products, most hard decisions are edge cases.

The ideology I work from: subscribers are mission partners, not just customers.

When a professional user pays for a tool, they’re making it possible for others who can’t pay to access something valuable. That framing only holds if the free tier is genuinely good. If the free tier is deliberately weak to force upgrades, you’re not distributing access — you’re withholding it and calling it generosity.

This connects directly to how I think about subscription products for ministry — the paywall philosophy and the subscription model are two sides of the same decision. Get one wrong and you’ve undermined both.

Where AI Makes This Harder

Every product team is about to face a version of the same question: where does AI land in the framework?

The professional-tools category is relatively clear. AI features that accelerate professional workflows — research assistants, document generation, structured analysis — belong in paid tiers. The value is concrete, the willingness to pay is real.

The personal-use category is harder. An AI that helps someone understand a difficult concept or access something they couldn’t reach before — is that a professional tool, or is it closer to a core access feature? If you gate it, you’ve made a decision. That decision might be right for the business. It might not be right for the mission. Those aren’t the same thing, and pretending they always align is how you end up making the wrong call at the edge cases.

The products that age well are the ones that answered that question from the mission side first and built the revenue model around the answer.

The Clarity You Owe Your Users

The mistake I described at the start wasn’t really a pricing mistake. It was a clarity mistake. We hadn’t named what we believed about our users, so we defaulted to what we thought premium users deserved instead of what free users needed.

The framework above doesn’t eliminate that tension. It makes it visible. When you can map the jobs, run the four tests, and state the ideology out loud in a room with your team, you at least know what tradeoff you’re making.


Your Turn: Apply This Today

Walk your current or next paywall decision through this framework before your next pricing review:

  • Categorize every feature as acquisition or retention. Go through your feature list and sort each into: “this brings users in” vs. “this keeps users coming back.” Acquisition features belong in free. Retention features belong in paid.
  • Identify your “aha moment” and protect it. What is the single experience that converts a skeptic into a believer in your product? Make sure every user reaches it before they hit a paywall. If the aha is behind a gate, you’re metering the wrong thing.
  • Map your mission-critical users. Who are the users whose engagement creates value for everyone else — contributors, creators, community members? Consider whether paywalling them is worth the cost to the overall ecosystem.
  • Run a “free tier audit.” List everything currently free. For each item, ask: is this generating acquisition or just giving away margin? Remove or gate anything that isn’t actively driving new user acquisition or ecosystem health.
  • Test your paywall message before your paywall decision. Show users a description of your paid tier before you build it. Does the value proposition make them lean in or shrug? If they shrug, you haven’t found the right gate yet.
  • Set a paywall conversion benchmark. Decide in advance what free-to-paid conversion rate would validate your model. Build toward that number. If you hit 6 months without reaching it, revisit what’s behind the gate.

Navigating freemium strategy, pricing philosophy, or subscription model design? These are decisions I’ve made across multiple faith-tech products. Let’s think through it together.

The Sermon Library Problem: What Chegg’s Collapse Taught Me About AI and Product-Market Fit

I run a product that helps pastors prepare sermons. We have 245,000+ sermons in our library, decades of content, and a subscription model that’s worked for years. Pastors come to SermonCentral when they’re stuck, when they need inspiration, when Sunday is three days away and the blank page is winning.

And for the first time in my career, I’m looking at that model and asking: Is our product-market fit about to evaporate?

The Treadmill You Don’t See Moving

Reforge published a piece that hit me like a gut punch. The core argument: product-market fit is a treadmill, not a destination. The bar for what customers consider “good enough” is always rising, and AI just cranked the speed to a sprint.

Here’s the part that stuck with me: unlike previous tech shifts that unfolded over years, AI causes the PMF threshold to spike exponentially, giving incumbent solutions no time to adapt before losing relevance.

Mobile took a decade to reshape industries. Cloud computing gave companies 5–7 years to migrate. AI? The examples are already piling up.

Chegg lost 87.5% of its valuation. Stack Overflow saw traffic crater. These were category leaders with massive content libraries and loyal user bases.

Sound familiar?

The Chegg Parallel Is Uncomfortably Close

Let me lay this out plainly.

Chegg’s model: a massive library of human-created study content, monetized through subscriptions, behind a paywall. Students paid because they couldn’t get the answers anywhere else. Then ChatGPT showed up. Suddenly, students could get comparable answers, for free, instantly, with no subscription required.

Now replace “students” with “pastors” and “study content” with “sermon outlines.” Replace “ChatGPT” with SermonAI, Sermon Snap, or honestly just ChatGPT itself with a decent prompt.

The structural vulnerability is identical: a content library behind a paywall, AI that generates comparable output for free, and a “good enough” bar that resets overnight.

I talk to pastors every week. More of them are experimenting with AI tools for sermon prep. Most are cautious about it — but it solves their immediate problem: “I’m stuck and Sunday is coming.”

The Real Strategic Question

Here’s where most product leaders get it wrong. The knee-jerk reaction is: “We’ll just bolt AI onto our existing product.” Slap a chatbot on the homepage. Add an AI summary feature. Ship it fast.

That misses the deeper question: is your product the platform people use AI through, or the platform that AI makes redundant?

If you’re a content library and you add AI search, you’re still a content library. You’ve made the existing model slightly better, but the customer’s mental model hasn’t changed. They’re still coming to you for content, and AI is still generating that content for free elsewhere.

The real pivot is harder. It means rethinking what your product actually is.

For SermonCentral, the strategic move is “AI-powered sermon prep workspace where our library is an input, not the product.” The library becomes training data, context, theological grounding — the thing that makes our AI better than generic ChatGPT. The product becomes the workflow.

Three Questions Every SaaS Leader Should Be Asking Right Now

If you run a content-library or knowledge-base product — and honestly, this applies to most subscription SaaS — here’s the framework I’m using:

1. Can AI generate a “good enough” version of your core deliverable? Be brutally honest. Can AI give my customer something that clears their bar? For many use cases, 70% quality delivered instantly beats 95% quality behind a paywall. If the answer is yes, your paywall is losing its teeth.

2. What does your product offer that AI alone cannot? This is where you find your moat — or discover you don’t have one. For SermonCentral, it’s community validation (knowing 10,000 other pastors used this sermon), exegetical depth that’s been peer-reviewed, denominational fit, and sermon series planning that accounts for the liturgical calendar. These are things a generic AI doesn’t know to consider. Your version of this list is your survival strategy.

3. Are your current users already using AI tools alongside your product? If you don’t know, find out this week. A simple exit survey question, an onboarding poll, a one-question email. If even 15–20% of your users are experimenting with AI alternatives, the adaptation window is already closing.

The Window Is Smaller Than You Think

With previous technology shifts — mobile, cloud, social — companies had years to adapt. You could see the wave coming, form a committee, hire a consultant, run a pilot, iterate for a few quarters, and still catch up.

AI doesn’t work like that. The window slams shut before you recognize the threat severity. Chegg didn’t see a slow decline and choose not to respond. They saw a cliff, and by the time they recognized it, they were already falling.

The work we’re doing right now at SermonCentral — instrumenting whether our users are using AI sermon tools, prototyping AI-augmented workflows that use our library as an input rather than competing with free generation, rethinking activation so that a new user’s first 48 hours deliver something AI alone cannot — is existential work.

What’s Actually Scarce

The companies who survive this shift will be the ones who rebuilt their products around the assumption that AI-generated content is free and abundant — and then found the thing that’s scarce.

For church tech, what’s scarce is trust. Theological accuracy. Community wisdom. The peace of mind that comes from knowing your sermon was shaped by a tradition, not just generated by a machine.

Those things have real value. But only if we build products that deliver them in ways a pastor can feel on a Tuesday night when Sunday is looming.

The treadmill is speeding up. Time to change direction before it throws you off.


Your Turn: Apply This Today

Use these questions to pressure-test whether your product is the next Chegg — or the one that replaces it:

  • Name the task your product completes. Write it in one sentence from the user’s perspective: “When I need to ______, I use ______.” If AI can now complete that task in 30 seconds, your moat is eroding. Be honest.
  • Run the “AI substitute” test. Have someone on your team try to accomplish your product’s core job-to-be-done using only ChatGPT or Claude. Document exactly where the AI falls short. That gap is your defensible surface.
  • Map your unique data assets. What does your product know that a general-purpose AI cannot access? User history, community content, proprietary datasets? If the answer is “nothing,” that’s your most urgent product risk.
  • Identify your highest-engagement users and interview three of them. Ask them directly: “Have you tried using AI tools for what you use us for? What happened?” Their answer will tell you more than any dashboard.
  • Rewrite your product’s value proposition for the AI era. Your old version assumed AI wasn’t available. Rewrite it assuming users have access to powerful AI. What do you still uniquely offer? That’s your actual pitch.
  • Set a 90-day “substitution risk review.” Put a recurring calendar item to evaluate how much of your product’s core workflow can now be replicated by AI. Treat it as a competitive threat review, not a tech curiosity.

Is your SaaS product facing a similar AI-driven PMF threat? I help product leaders think through competitive positioning and strategic pivots in the age of AI. Let’s talk.

Stop Measuring DAU for Products Used Once a Week

If you’re measuring DAU for weekly products, you’re measuring the wrong thing — and it’s actively misleading your team. I spent thirteen years building products for pastors and ministry leaders across four continents, and for most of that time, I was using the wrong metrics.

At SermonCentral, our DAU looked terrible on paper. We were serving hundreds of thousands of active pastors — and our daily active user numbers were abysmal. Not because the product was broken. Because pastors prep sermons once a week, not every Tuesday at 3pm.

Most PM frameworks assume daily usage patterns. Retention curves calibrated for social media. Activation metrics borrowed from productivity tools. Engagement loops designed for apps people compulsively open.

If your users naturally engage weekly, those metrics aren’t just wrong — they’re actively misleading.

Why DAU for Weekly Products Fails

Here’s what I learned building SermonCentral: Sunday drives everything.

Sermon searches spike Monday through Wednesday. Downloads peak Thursday. Friday and Saturday? Ghost town. Sunday morning brings a flurry of last-minute mobile access, then complete silence until Monday.

Our DAU/MAU ratio never broke 20%. In consumer app terms, that’s death. For a weekly product serving hundreds of thousands of active pastors, it was exactly right.

The same pattern holds across every faith-based product I’ve built or analyzed. Church attendance is weekly. Small group meetings are weekly. Sermon prep follows a weekly rhythm that’s been consistent for centuries.

Your DAU will never look like Slack’s. That’s not a bug — that’s your users’ real spiritual and professional rhythm.

What Actually Matters: Weekly Active Metrics

After tracking both daily and weekly engagement across multiple products, here are three metrics that actually predict success for weekly-use products:

Weekly Active Sermon Searches (WASS) — At SermonCentral, we tracked unique pastors running sermon searches within their prep window (typically Monday–Thursday). This number stayed remarkably consistent week-over-week, even as DAU fluctuated wildly.

Sunday-to-Sunday Retention — Did the pastor who used your tool this Sunday also use it next Sunday? This 7-day cycle retention tells you more about product-market fit than any daily metric.

Content Action Rate within 48 Hours — Pastors who print, download, or save content within 48 hours of trial signup complete their first full prep cycle. The ones who don’t rarely convert to paid.

These patterns hold across different faith contexts too. When we analyzed usage patterns across regions — post-Christian Europe, Sub-Saharan Africa, Southeast Asia — weekly rhythms emerged regardless of cultural differences.

Daily vs Weekly vs Intermittent: Three Different Playbooks

I’ve built products across three different usage patterns, and each requires a completely different success framework:

Daily-use products (Bible reading apps, devotional tools): DAU matters. Optimize for habit formation, reading streak maintenance, daily content delivery.

Weekly-use products (SermonCentral, sermon prep tools, weekly group resources): DAU is meaningless. Optimize for prep-to-pulpit completion rates and Sunday-to-Sunday retention.

Intermittent-use products (discipleship tools triggered by life events, not calendar events): Neither daily nor weekly metrics capture the real value. A teenager might use a discipleship app three times in one week during a crisis, then not touch it for a month.

We spent two years trying to force SermonCentral into daily engagement patterns before accepting that the weekly rhythm was a feature, not a flaw. This connects directly to the broader challenge of product-market fit in faith tech — understanding what your users actually need versus what standard SaaS metrics say they should need.

The Real Activation Moment for Weekly Products

Most PM frameworks define activation as completing key actions in your first session. Sign up, upload a photo, connect with three people, send your first message.

For weekly products, that’s backwards.

A pastor who signs up for SermonCentral and browses illustrations on Tuesday hasn’t activated. A pastor who uses your research tool to prep Wednesday, builds a sermon Thursday, and preaches it Sunday has completed one full cycle. That’s your aha moment.

I started tracking what we called “First Sermon Sunday” — the percentage of trial users who completed a full prep-to-pulpit cycle within their first two weeks. This number predicted paid conversion better than any first-session metric we tested.

The activation window isn’t “first session.” It’s “first use cycle.” Andrew Chen’s research on retention and engagement confirms that retention curves look radically different depending on the natural usage frequency of your product category.

A Framework for Weekly Product Metrics

Step 1: Map your natural usage cycle. What’s the real-world rhythm your users follow? For pastors, it’s Sunday-to-Sunday. For small group leaders, it might be meeting-to-meeting. For Bible study groups, season-to-season.

Step 2: Design metrics around cycle completion. Instead of DAU, track users who complete full cycles. Instead of session length, track cycle-to-cycle retention. Instead of feature adoption, track cycle success rate.

Step 3: Time your interventions to the cycle. Onboarding emails on Monday morning, not immediately after signup. Retention campaigns on prep days, not random Tuesday afternoons. Feature announcements timed to the start of prep cycles.

Step 4: Report progress in cycle terms. “We had 2,847 active pastors this Sunday” tells a different story than “our DAU dropped 15% this week.” Both might be true. Only one reflects user reality.

The Deeper Insight

Spiritual formation follows rhythms, not funnels.

Product managers try to increase usage frequency because that’s what our metrics reward. But for faith-based products, the goal isn’t daily dependence — it’s weekly faithfulness. The goal is a product that fits how people actually live, worship, and grow.

Stop measuring DAU for products used once a week. Start tracking the metrics that matter: cycle completion, Sunday-to-Sunday retention, and the real activation moment when someone completes their first full use cycle.

Your engagement numbers might look terrible by consumer app standards. That might mean you’re building something that actually fits how people work and worship.


Your Turn: Apply This Today

Ready to move beyond DAU? Here’s how to start the conversation on your team:

  • Audit your current metrics stack. List every metric in your weekly review. For each one, ask: does this capture whether users achieved their actual goal, or just whether they showed up?
  • Define the “natural cadence” for your product. What is the genuine rhythm of value for your users? Daily? Weekly? Seasonally? Your core retention metric should match that cadence, not an industry benchmark.
  • Pick one “mission completion” metric to track this quarter. Identify the single action that signals a user got real value. Track completion rate and the time-to-completion. Make it a weekly review staple.
  • Have the stakeholder conversation proactively. Before your next leadership review, prepare a one-paragraph explanation of why your product’s success metric differs from DAU. Bring the alternative metric with 4 weeks of data.
  • Segment by use case, not frequency. Separate users who access your product daily for lightweight tasks from those who use it intensively once a week. Track each segment’s success separately rather than averaging them together.
  • Set a “dark usage” alert. Build or request a signal that fires when a high-value user hasn’t completed their key workflow in longer than their typical cadence. Absence at the right time is more meaningful than presence every day.

Building a product for ministry, church, or faith-based audiences? I consult with organizations navigating the intersection of product strategy, growth, and mission. Let’s talk.

Your International Users Aren’t Broken. Your Metrics Are.

I’ve lived in Sweden, Spain, and South Africa. I’ve traveled through India and spent years doing work across the African continent. I’ve sat in homes with no reliable electricity, ridden trains through Mumbai at rush hour, shared meals with families in rural Kenya where four people passed one phone around the table like a hymnal.

That context lives in every dashboard I’ve ever opened.

The Number That Started This

At a global digital platform I’ve worked on, we pulled activation rates by region. Users completing meaningful engagement within their first seven days:

  • North America: 34%
  • Southeast Asia: 12%
  • Sub-Saharan Africa: 8%

A standard read of those numbers would trigger a roadmap conversation about localization problems, onboarding failures, or weak product-market fit in two-thirds of the world.

That read would be wrong.

Those users aren’t failing to activate. They’re activating through patterns that Western product frameworks were never designed to see.

What Different Actually Looks Like

In Sweden, individual behavior is the default unit of everything — including how people use software. One person, one device, one account, one session. The product journey is linear and personal. Metrics built around this model feel like common sense because, in that context, they are.

South Africa broke that assumption for me fast.

In townships outside Cape Town, I watched families coordinate around a single smartphone. One device. Multiple users. Staggered access built around work schedules, school pickups, and when the data bundle was loaded. The “user” wasn’t an individual — it was a household.

In rural Kenya, connectivity isn’t a utility. It’s an event. When signal is available, you download everything you can. You consume it later, offline, sometimes in groups. What looks like three sporadic sessions on a retention curve might actually be one deeply intentional household engagement event.

In India, I watched people think in groups. Nobody wanted to be the one who got it wrong. Before committing to anything, they’d consult — family, colleagues, the cousin who works in tech. Not because they lacked confidence, but because they love their community too much to make a unilateral call that affects it. The conversion window isn’t 30 days. It’s however long trust takes to travel through a network.

I am not speculating about these patterns. I’ve watched them. They’re real. And none of them show up cleanly in a standard AARRM dashboard.

Where the Frameworks Break

Standard product metrics carry hidden assumptions. They’re not wrong exactly — they’re just calibrated for a specific kind of user in a specific kind of context. When the context changes, the assumptions crack.

The individual device assumption

DAU/MAU ratios assume one user per device. In markets where device sharing is normal, you’re measuring household behavior through an individual lens. You’ll systematically undercount engagement and misread retention.

The connectivity assumption

Retention curves assume users can return to your product whenever they want. When connectivity is intermittent, “churned” users are often just waiting for signal. We now track separate activation funnels for high-connectivity and intermittent-connectivity markets. The curves look completely different and require completely different responses.

The linear progression assumption

Western onboarding flows push users through individual setup before unlocking social features. In collectivist contexts, users want to share before they want to configure. They don’t skip setup because they’re disengaged — they skip it because community access is the point, not a reward for completing it.

The payment infrastructure assumption

A 30-day conversion window made sense when your users have credit cards and make financial decisions alone. In markets where mobile money dominates and purchasing decisions involve extended family consultation, 30 days is an arbitrary deadline that will make your international monetization look broken when it isn’t.

What We Actually Changed

Recognizing the problem is the easy part. We had to rebuild how we measure.

We segment activation funnels by connectivity profile, not just geography. A user in Lagos on intermittent mobile data gets evaluated against a completely different baseline than a user in Amsterdam on broadband. This alone changed how we prioritized international product work.

We moved toward value-event tracking instead of session frequency. The question stopped being “did they come back today?” and started being “did they get what they came for?” A user who engages three times a week in a group context can deliver more value — to themselves and to us — than a daily solo user, depending on the product.

We extended our monetization observation windows significantly in markets where financial decisions move through social networks before they land on a purchase button. This wasn’t generosity. It was accuracy.

We started tracking what I’d call cultural cohorts alongside temporal cohorts — grouping users by context type rather than signup month. The retention curves that emerge require fundamentally different interventions than anything a North American benchmark would suggest.

The Thing Worth Remembering

If you’re seeing real, sustained, non-bot international traffic, that’s a signal. Users in unfamiliar markets don’t find you by accident at scale. They found you because your product does something worth finding.

The question is whether you’re measuring them honestly.

An 8% activation rate in sub-Saharan Africa and a 34% activation rate in North America don’t automatically mean one market is working and one isn’t. They might mean you’re serving two completely different behavioral contexts with one measuring stick.

Your international users have probably already told you something. They showed up. They engaged in whatever way their lives allowed. They downloaded content for the offline hours, passed the phone across the table, and came back when the signal did.

The gap isn’t between them and your product. It’s between their reality and your dashboard.

Fix the dashboard.


¹ How to determine your activation rate

Photo by Christian Harb on Unsplash

Philippians 2 and Agentic Systems: Why Humility Is the Foundation of Intelligent Systems

“Do nothing from selfish ambition or conceit, but in humility count others more significant than yourselves. Let each of you look not only to his own interests, but also to the interests of others. Have this mind among yourselves, which is yours in Christ Jesus, who, though he was in the form of God, did not count equality with God a thing to be grasped, but emptied himself, by taking the form of a servant, being born in the likeness of men.” (Philippians 2:3-7, ESV)

Paul’s letter to the Philippians contains what theologians call the kenosis passage — the self-emptying of Christ. It’s about voluntary limitation, choosing constraint over capability, service over sovereignty.

I’ve been thinking about this as I watch agentic systems become more capable. The rhetoric around AI often centers on unlimited potential, boundless capability, systems that can do anything. But in my experience building AI systems, including multi-agent workflows for executive tasks, I’ve observed that success often comes through deliberate constraints rather than unlimited scope.

Well-designed AI agents typically focus on narrow mandates: a calendar agent that protects focused time blocks rather than trying to optimize entire lifestyles, or an email agent that surfaces priority messages rather than attempting to replace human judgment entirely. Each agent serves a specific function within defined bounds.

This represents a design choice rather than a technical limitation.

The Kenosis of Intelligent Systems

When building AI workflows, the temptation exists to create agents that can handle everything. But, what I’ve seen is that this approach typically produces chaotic results — agents interfering with each other, making decisions outside their expertise, creating more complexity than clarity.

A more effective approach involves thinking about AI agents as specialized robots rather than general-purpose minds. Each agent can be designed to “empty itself” of capabilities it doesn’t need, serving a specific function more effectively through limitation.

Specialized agents with narrow scopes — research agents that don’t schedule meetings, scheduling agents that don’t write summaries, writing agents that don’t manage tasks — can demonstrate greater utility through deliberate constraints.

This mirrors patterns in effective human teams, which typically consist of specialists who understand their roles rather than generalists attempting everything. They practice a form of professional kenosis — voluntary limitation for collective effectiveness.

Paul’s instruction to “count others more significant than yourselves” suggests a design principle: building systems where each component serves the whole rather than maximizing individual capabilities.

The Servant Leadership Model for AI

The parallels between servant leadership principles and effective AI system design are notable. Servant leaders focus on enabling others’ success rather than demonstrating their own power, asking “How can I help you accomplish your goals?” rather than “How can I show you what I can do?”

Effective AI systems often follow similar patterns. GitHub Copilot suggests contextual code completions rather than attempting to write entire applications. AI writing assistants help clarify thinking rather than replacing human thought processes. Advanced language models acknowledge uncertainty and ask clarifying questions rather than claiming omniscience.

These systems practice technological humility by acknowledging their limitations.

In contrast, AI systems that fail in production environments often attempt to exceed their appropriate scope, make decisions beyond their training data, or present uncertain inferences as established facts. They lack the kenotic restraint that characterizes truly useful intelligence.

Building Products for Global Spiritual Formation

This principle becomes particularly important when developing products for spiritual formation. Digital discipleship platforms serve diverse global communities across cultural, linguistic, and theological boundaries. The temptation exists to build universal systems that can serve everyone.

However, effective spiritual formation tends to be deeply personal and contextual. A Bible application serving a house church in rural Kenya requires different features than one serving a suburban megachurch. Prayer applications for new believers need different structures than those designed for theological students.

AI systems serving spiritual formation appear most effective when they practice kenosis — limiting their scope to serve specific communities well rather than attempting to serve everyone adequately.

Current development work on AI tools for sermon preparation follows this model. Rather than attempting to write complete sermons (which, based on informal conversations with pastoral leaders, many pastors prefer to avoid), such tools can focus on specific supportive tasks: locating relevant cross-references, summarizing historical context, or structuring outlines. They operate within deliberate constraints to support pastoral ministry rather than replace it.

Each tool “empties itself” of broader capabilities to serve one function excellently. Like Paul’s description of Christ, they don’t grasp for equality with human pastors — they take the form of servants.

The Paradox of Powerful Restraint

An interesting observation: seemingly powerful AI systems often prove most effective when operating under significant constraints. The wisdom of limiting scope applies to artificial intelligence as much as human teams.

In my experience, the most effective AI implementations have narrow, well-defined purposes. They operate within their designated areas, defer to human judgment on edge cases, and acknowledge when they lack sufficient context for recommendations.

This represents strength through limitation rather than weakness.

Paul writes that Christ “did not count equality with God a thing to be grasped.” He could have insisted on unlimited power but chose constraint for the sake of service. The kenosis wasn’t a loss of divinity — it was divinity expressed through voluntary limitation.

Similarly, the most intelligent AI systems may not be those with the most capabilities, but those that use their capabilities most wisely — which often means choosing restraint over action.

Technical Humility in Agentic Systems

What might this look like in actual system design? Consider what could be called “kenotic interfaces” — AI systems that actively limit their own scope.

For example, an email management system might flag messages for human review when confidence levels fall below high thresholds, choosing uncertainty over potentially incorrect automated actions. A research assistant might include confidence indicators in summaries, distinguishing between well-sourced findings and preliminary observations that require verification.

These design choices represent features rather than limitations. The wisdom of acknowledging uncertainty can increase system trustworthiness.

The Global Scale Challenge

Building for global spiritual formation means designing for contexts that developers may never fully understand. While optimization for familiar cultural contexts remains feasible, platforms serving Orthodox Christians in Eastern Europe, Pentecostals in West Africa, and house churches throughout Asia require different approaches.

The kenotic approach suggests building systems that acknowledge their cultural limitations. Rather than attempting to provide universal spiritual guidance, they can provide tools that local leaders adapt to their specific contexts.

Bible reading features need not assume Western individualism. Prayer tools need not assume specific liturgical traditions. Community features need not assume particular church structures.

Each feature can “empty itself” of cultural assumptions to serve diverse communities more effectively. Like Christ taking human form while maintaining divine nature, these systems can preserve core functionality while adapting to local contexts.

The Long View

Paul’s kenosis passage encompasses more than humility — it describes transformation. “Therefore God has highly exalted him and bestowed on him the name that is above every name” (Philippians 2:9). Self-emptying leads to greater effectiveness rather than diminishment.

A similar pattern may emerge for AI systems. Those practicing technological kenosis — voluntary constraint for the sake of service — may ultimately prove more valuable than systems grasping for unlimited capability.

The Tower of Babel failed because it attempted to exceed proper limits. Modern AI might encounter similar challenges without the discipline of restraint.

The most powerful systems may be those that understand when not to exercise their power.


Key Insight:

The kenosis principle — Christ’s voluntary self-emptying described in Philippians 2 — offers a design philosophy for AI systems. Instead of maximizing capabilities, effective AI agents can practice deliberate constraint, serving specific functions excellently rather than attempting everything adequately. This proves particularly relevant for products serving global spiritual formation, where cultural humility and contextual awareness matter more than technical sophistication. Just as Christ didn’t grasp for equality with God but took the form of a servant, intelligent systems may become more useful when they acknowledge limitations and defer to human judgment on edge cases. The paradox of kenosis — that voluntary limitation can lead to greater effectiveness — may apply to artificial intelligence as much as spiritual leadership. In a world of increasingly capable AI, the most valuable systems may be those that understand when not to use their power.

Photo by Vitaly Gariev on Unsplash

I Built an AI Chief of Staff. Here’s What I Learned About AI Agents.

Six months ago, I was drowning. Director of Product Management, building tools for millions of monthly users, while simultaneously launching a new venture in the digital discipleship space. Two products, two teams, two companies — and the day still only had 24 hours.

That’s when I built theconsilium.ai. Not a chatbot. Not a writing assistant. An actual AI chief of staff with 18 autonomous agents that run on cron jobs, conduct overnight research, and synthesize insights while I sleep. MEASURED: It has been running for six months and has processed over 200 research tasks without human intervention.

Here’s what I learned about AI agents that actually work.

The System: 18 Agents, One Goal

CONSILIUM isn’t a single AI doing everything. It’s a distributed system where each agent has one job and does it autonomously.

Morning Intelligence: MEASURED: Agent pulls my calendar, scans my Substack subscriptions, scores articles for relevance (1-10), and delivers a briefing by 6 AM. The scoring algorithm looks for keywords like “product management,” “AI agents,” and “digital discipleship” — topics central to my work.

Competitive Monitoring: MEASURED: Three agents track Bible Gateway competitors, one each for YouVersion, Logos, and emerging players. They parse feature announcements, pricing changes, and user feedback from app stores. Every Sunday, they synthesize findings into a competitive landscape update.

Research Queue: MEASURED: The breakthrough agent. I can drop a research question into Slack — “What’s the current state of AI in sermon preparation?” — and wake up to a 3-page analysis with citations, market sizing, and key players identified.

Meeting Intelligence: MEASURED: Records, transcribes, and extracts action items from every call. But here’s the key — it doesn’t just summarize. It connects insights across meetings. When the same concern appears in three different conversations, it flags the pattern.

INFERRED: The magic appears to happen in the synthesis layer. Individual agents feed insights to a coordinator that seems to find connections no single agent would catch. When the competitive agent notices YouVersion launching AI-powered reading plans the same week my research queue analyzes sermon prep tools, the coordinator connects those dots.

What Actually Works: Autonomous Research Patterns

The most successful agents follow what I call the “autoresearch pattern” — borrowing from Andrej Karpathy’s autoresearch concept. The AI doesn’t just answer questions. It generates its own research methodology.

MEASURED: Here’s how it works: I ask “What’s driving growth in digital discipleship tools?” The agent doesn’t immediately search for articles. First, it creates a research plan:

  • Define “digital discipleship tools” (Bible apps, prayer apps, church management)
  • Identify key metrics (downloads, DAU, revenue, user retention)
  • Map competitive landscape (incumbents vs startups)
  • Analyze growth vectors (organic, paid, partnerships)

Then it executes the plan autonomously. It reads through my curated sources, scores relevance, and builds a knowledge graph of interconnected findings. By morning, I have not just answers — I have a research methodology I can reuse.

INFERRED: This pattern appears to scale. The agent that monitors AI in ministry doesn’t just flag new tools. It seems to be building a taxonomy of use cases, tracking adoption curves, and identifying white space in the market. Over six months, it has accumulated insights that would require significant manual effort to compile.

The Critical Failure: Evidence vs Inference

The biggest failure almost killed the system’s credibility. Early versions presented inferences as facts.

An agent researching Bible reading habits would write: “Daily Bible reading is declining 15% year-over-year among evangelicals.” Authoritative. Specific. Completely unsourced. [This was a fabricated example showing the problem — not actual data]

I instituted the evidence-level rule. Every factual claim must carry its confidence level:

  • MEASURED: From instrumented data (our own analytics, published studies)
  • INFERRED: From aggregate patterns without direct tracking
  • ASSUMED: From domain knowledge or simulated data

Now the same type of finding reads: “INFERRED: Based on aggregate app store ratings and general survey trends in religious engagement, daily Bible reading may be declining among evangelicals — but we cannot prove causation without cohort tracking.” [CITATION NEEDED for specific survey data]

It’s longer. It’s hedged. It’s credible.

This mirrors the challenge every product leader faces with AI agents for productivity. An AI that confidently presents guesses as facts is worse than no AI at all. The hedge language isn’t a bug — it’s what makes the system trustworthy enough to inform real decisions.

The Abstraction Shift: From Doer to Designer

Six months in, my role has shifted. I’m no longer researching competitive moves or manually tracking industry trends. Instead, I’m designing research methodologies.

MEASURED: When I wanted to understand the global digital discipleship market, I didn’t spend hours reading reports. I defined the research parameters:

  • Geographic scope (focus on India, Brazil, Nigeria)
  • Time horizon (3-year trend analysis)
  • Key players (Bible Gateway, YouVersion, local language apps)
  • Success metrics (user growth, localization depth, offline functionality)

The agents executed the research overnight. By morning, I had a comprehensive analysis that required substantial time investment to produce manually.

This is the Karpathy pattern in practice. The human moves up one level of abstraction — from doing the research to designing the research. I’m not replaced. I’m leveraged.

What Doesn’t Scale: The Human Elements

MEASURED: CONSILIUM handles information processing effectively. It fails at everything requiring human judgment.

Context switching: MEASURED: Agents can’t read the room. When a crisis hits — a security vulnerability, a key team member leaving — the system keeps delivering scheduled insights about competitive analysis. It doesn’t know when to pivot priorities.

Stakeholder dynamics: MEASURED: The system can analyze what competitors are building. It can’t navigate the politics of why our team should or shouldn’t build the same features. It doesn’t understand that some decisions are about people, not products.

Emotional intelligence: MEASURED: When meeting transcripts show tension between team members, agents flag it as a pattern. But they can’t suggest how to address interpersonal conflicts or when to have difficult conversations.

ASSUMED: The most successful AI agents for productivity likely complement human judgment — they don’t replace it. They handle the information processing that scales poorly for humans, freeing up mental capacity for the decisions that require wisdom, empathy, and context.

The Future: Intelligence Infrastructure for Every Product Leader

Here’s what excites me: CONSILIUM gives me intelligence infrastructure that only VPs at Fortune 500 companies used to have.

Competitive intelligence teams. Market research analysts. Executive assistants who can synthesize information across multiple workstreams. These were luxuries for senior executives with budget and headcount.

ASSUMED: Now, any product leader can potentially build similar capabilities, though the cost-effectiveness depends on specific API pricing and usage patterns. The barrier isn’t necessarily budget — it’s knowing how to architect autonomous systems that work reliably.

This isn’t about replacing human executive assistants (they’re irreplaceable for stakeholder management and complex coordination). It’s about democratizing the analytical infrastructure that helps leaders make informed decisions.

ASSUMED: Over the next year, I’m guessing we’ll see AI agents for productivity evolve from “smart assistants” to “autonomous intelligence teams.” The winners will likely be product leaders who learn to think like systems architects — designing agent workflows, not just prompting individual AIs.

The question isn’t whether AI agents will change how product leaders work. It’s whether you’ll design those systems yourself or let someone else define the methodology.


Want to build your own AI chief of staff? Start with one agent that handles one workflow autonomously. Master the autoresearch pattern. And always flag the difference between what you’ve measured and what you’ve inferred — your future self will thank you for the intellectual honesty.

Photo by CRYSTALWEED cannabis on Unsplash