Darwin’s Dangerous Idea and the Feature Factory Problem: What Evolution Teaches AI Product Managers

Most product managers approach AI like intelligent designers. We map the problem space, specify the solution, define the success metrics, and ship. We assume complex, useful behavior can be deliberately engineered from the top down.

Darwin’s core insight — that complex, purposeful-seeming systems can emerge without a central designer making deliberate choices — turns out to be more relevant to AI product development than most product leaders recognize. The products that have surprised their creators with unexpected value often got there through variation and selection, not specification. The ones that failed often did so because they were too precisely designed.

The Feature Factory Is an Intelligent Design Problem

The feature factory pattern — shipping features faster than users can absorb them, treating the roadmap as a delivery queue rather than a discovery process — is fundamentally an intelligent design failure. Teams act as if they can predict exactly what users need, specify it precisely, build it faithfully, and ship it to universal acclaim. The unpredictability of real users keeps surprising them.

Darwinian product development looks different: create variation, apply selection pressure, amplify what survives. Ship something with enough structure to be useful but enough flexibility to adapt. Watch what happens. Kill what fails fast, invest in what unexpectedly thrives. Repeat.

This isn’t a new idea — Eric Ries built Lean Startup around it. But AI makes it both more powerful and more necessary. More powerful because AI systems can generate variation at a scale humans can’t. More necessary because AI capabilities evolve faster than we can predict their applications, which means top-down specification misses most of the opportunity.

What Happened When I Let My AI System Evolve

I started building my multi-agent AI workflow the way most PMs approach feature development: specify what each agent should do, build it, deploy it. Email triage agent: sorts and prioritizes. Meeting summary agent: extracts decisions and action items. Research synthesis agent: connects relevant findings to current questions. Clean scoping, clear outputs.

That worked. But the interesting things happened when I stopped controlling the outputs so tightly.

The research synthesis agent started connecting ideas across domains I hadn’t linked — product insights from one industry informing decisions in another. The meeting summary agent began surfacing action items I hadn’t explicitly identified. The email triage agent started flagging opportunities I would have missed in a busy inbox.

None of that was specified. It emerged from iteration, from giving the agents broader parameters and selecting for what actually proved useful. The most valuable behaviors came from what looked like “mistakes” in the initial specification — outputs that weren’t what I asked for, but turned out to be better than what I asked for.

The Selection Pressure Problem in Product Organizations

Here’s where most organizations get stuck: they apply the wrong selection pressures to AI features.

Traditional product metrics — engagement, retention, feature adoption in the first 30 days — optimize for predictable behavior. They reward features that do exactly what users expect. In a stable environment, that’s fine. But AI capabilities evolve faster than user expectations, which means genuinely innovative AI features often fail traditional adoption metrics in their early stages.

I’ve watched product teams kill promising AI features because initial adoption was low, use cases were unclear, or users couldn’t articulate the value. Those are normal early-stage signals for genuinely novel capability — not kill signals. The teams that applied narrow selection pressure too early eliminated features that would have compounded in value as users developed new mental models for how to use them.

The right selection pressures for AI features look different: Are users who discover this feature coming back to it? Are they finding use cases we didn’t anticipate? Does it surface unexpected value even in early, imperfect form? These are survival signals, not lagging adoption metrics.

Practical Implications for AI Product Teams

Design for variation, not specification. The most useful AI features often emerge from systems that can adapt to individual users, not systems that deliver uniform experiences. Build in the variability deliberately. Let the system learn what works for different users and contexts rather than forcing one behavior on everyone.

Apply selection pressure at the right timescale. AI features that teach users new ways of working need longer evaluation windows than features that automate familiar workflows. Build in explicit “evolution periods” before you make kill-or-invest decisions.

Watch for emergent use cases. Your roadmap won’t predict the most valuable use cases for a genuinely novel AI feature. Set up the observation infrastructure to see what users do with it — not just whether they use it. Teresa Torres’ continuous discovery framework applies here: you need ongoing user contact to see the emergent behaviors, not just launch metrics.

Kill the feature factory framing entirely. If your product org treats the roadmap as a delivery queue, AI won’t change that pattern — it’ll accelerate it. You’ll ship more features faster and learn less from each one. The opportunity solution tree approach matters more, not less, when AI is involved.


Your Turn: Apply This Today

Break the feature factory pattern with these concrete interventions:

  • Audit your last ten shipped features for survival rate. Pull your release history from the past 6 months. For each feature, ask: is it still being used at the rate we hoped? If fewer than 30% are performing to expectation, you’re running a feature factory. Name it.
  • Introduce “feature deprecation” as a quarterly ritual. Every quarter, identify two to three features with low engagement and make a decision: improve them, deprecate them, or document why they’re worth keeping despite low usage. Add this to your roadmap review cadence.
  • Slow down before the next AI feature request. The next time a stakeholder asks for an AI feature, apply one filter before it goes on the roadmap: “What user outcome does this advance, and how will we know if it worked?” If no one can answer it cleanly, it’s not ready.
  • Measure “feature discovery” separately from “feature usage.” A feature that exists but isn’t found is a design problem. A feature that’s found but not used is a value problem. Distinguish them. They have different fixes.
  • Run a “natural selection” exercise on your backlog. Stack-rank your backlog not by business request priority but by the question: if we could only ship the features that directly advance our top user outcome, which ones survive? Cut everything below the line for the next sprint.
  • Establish an “outcome, not output” norm in sprint reviews. Require every feature to be presented alongside its success metric before it enters development — not after. If the team can’t define success in advance, the feature isn’t ready to build.

Running an AI product team and finding that standard product frameworks are breaking down? I consult with product organizations on AI product strategy, discovery processes, and building the organizational muscle to learn faster from AI features. Let’s talk.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.