Six months ago, I was drowning. Director of Product Management, building tools for millions of monthly users, while simultaneously launching a new venture in the digital discipleship space. Two products, two teams, two companies — and the day still only had 24 hours.
That’s when I built theconsilium.ai. Not a chatbot. Not a writing assistant. An actual AI chief of staff with 18 autonomous agents that run on cron jobs, conduct overnight research, and synthesize insights while I sleep. MEASURED: It has been running for six months and has processed over 200 research tasks without human intervention.
Here’s what I learned about AI agents that actually work.
The System: 18 Agents, One Goal
CONSILIUM isn’t a single AI doing everything. It’s a distributed system where each agent has one job and does it autonomously.
Morning Intelligence: MEASURED: Agent pulls my calendar, scans my Substack subscriptions, scores articles for relevance (1-10), and delivers a briefing by 6 AM. The scoring algorithm looks for keywords like “product management,” “AI agents,” and “digital discipleship” — topics central to my work.
Competitive Monitoring: MEASURED: Three agents track Bible Gateway competitors, one each for YouVersion, Logos, and emerging players. They parse feature announcements, pricing changes, and user feedback from app stores. Every Sunday, they synthesize findings into a competitive landscape update.
Research Queue: MEASURED: The breakthrough agent. I can drop a research question into Slack — “What’s the current state of AI in sermon preparation?” — and wake up to a 3-page analysis with citations, market sizing, and key players identified.
Meeting Intelligence: MEASURED: Records, transcribes, and extracts action items from every call. But here’s the key — it doesn’t just summarize. It connects insights across meetings. When the same concern appears in three different conversations, it flags the pattern.
INFERRED: The magic appears to happen in the synthesis layer. Individual agents feed insights to a coordinator that seems to find connections no single agent would catch. When the competitive agent notices YouVersion launching AI-powered reading plans the same week my research queue analyzes sermon prep tools, the coordinator connects those dots.
What Actually Works: Autonomous Research Patterns
The most successful agents follow what I call the “autoresearch pattern” — borrowing from Andrej Karpathy’s autoresearch concept. The AI doesn’t just answer questions. It generates its own research methodology.
MEASURED: Here’s how it works: I ask “What’s driving growth in digital discipleship tools?” The agent doesn’t immediately search for articles. First, it creates a research plan:
- Define “digital discipleship tools” (Bible apps, prayer apps, church management)
- Identify key metrics (downloads, DAU, revenue, user retention)
- Map competitive landscape (incumbents vs startups)
- Analyze growth vectors (organic, paid, partnerships)
Then it executes the plan autonomously. It reads through my curated sources, scores relevance, and builds a knowledge graph of interconnected findings. By morning, I have not just answers — I have a research methodology I can reuse.
INFERRED: This pattern appears to scale. The agent that monitors AI in ministry doesn’t just flag new tools. It seems to be building a taxonomy of use cases, tracking adoption curves, and identifying white space in the market. Over six months, it has accumulated insights that would require significant manual effort to compile.
The Critical Failure: Evidence vs Inference
The biggest failure almost killed the system’s credibility. Early versions presented inferences as facts.
An agent researching Bible reading habits would write: “Daily Bible reading is declining 15% year-over-year among evangelicals.” Authoritative. Specific. Completely unsourced. [This was a fabricated example showing the problem — not actual data]
I instituted the evidence-level rule. Every factual claim must carry its confidence level:
- MEASURED: From instrumented data (our own analytics, published studies)
- INFERRED: From aggregate patterns without direct tracking
- ASSUMED: From domain knowledge or simulated data
Now the same type of finding reads: “INFERRED: Based on aggregate app store ratings and general survey trends in religious engagement, daily Bible reading may be declining among evangelicals — but we cannot prove causation without cohort tracking.” [CITATION NEEDED for specific survey data]
It’s longer. It’s hedged. It’s credible.
This mirrors the challenge every product leader faces with AI agents for productivity. An AI that confidently presents guesses as facts is worse than no AI at all. The hedge language isn’t a bug — it’s what makes the system trustworthy enough to inform real decisions.
The Abstraction Shift: From Doer to Designer
Six months in, my role has shifted. I’m no longer researching competitive moves or manually tracking industry trends. Instead, I’m designing research methodologies.
MEASURED: When I wanted to understand the global digital discipleship market, I didn’t spend hours reading reports. I defined the research parameters:
- Geographic scope (focus on India, Brazil, Nigeria)
- Time horizon (3-year trend analysis)
- Key players (Bible Gateway, YouVersion, local language apps)
- Success metrics (user growth, localization depth, offline functionality)
The agents executed the research overnight. By morning, I had a comprehensive analysis that required substantial time investment to produce manually.
This is the Karpathy pattern in practice. The human moves up one level of abstraction — from doing the research to designing the research. I’m not replaced. I’m leveraged.
What Doesn’t Scale: The Human Elements
MEASURED: CONSILIUM handles information processing effectively. It fails at everything requiring human judgment.
Context switching: MEASURED: Agents can’t read the room. When a crisis hits — a security vulnerability, a key team member leaving — the system keeps delivering scheduled insights about competitive analysis. It doesn’t know when to pivot priorities.
Stakeholder dynamics: MEASURED: The system can analyze what competitors are building. It can’t navigate the politics of why our team should or shouldn’t build the same features. It doesn’t understand that some decisions are about people, not products.
Emotional intelligence: MEASURED: When meeting transcripts show tension between team members, agents flag it as a pattern. But they can’t suggest how to address interpersonal conflicts or when to have difficult conversations.
ASSUMED: The most successful AI agents for productivity likely complement human judgment — they don’t replace it. They handle the information processing that scales poorly for humans, freeing up mental capacity for the decisions that require wisdom, empathy, and context.
The Future: Intelligence Infrastructure for Every Product Leader
Here’s what excites me: CONSILIUM gives me intelligence infrastructure that only VPs at Fortune 500 companies used to have.
Competitive intelligence teams. Market research analysts. Executive assistants who can synthesize information across multiple workstreams. These were luxuries for senior executives with budget and headcount.
ASSUMED: Now, any product leader can potentially build similar capabilities, though the cost-effectiveness depends on specific API pricing and usage patterns. The barrier isn’t necessarily budget — it’s knowing how to architect autonomous systems that work reliably.
This isn’t about replacing human executive assistants (they’re irreplaceable for stakeholder management and complex coordination). It’s about democratizing the analytical infrastructure that helps leaders make informed decisions.
ASSUMED: Over the next year, I’m guessing we’ll see AI agents for productivity evolve from “smart assistants” to “autonomous intelligence teams.” The winners will likely be product leaders who learn to think like systems architects — designing agent workflows, not just prompting individual AIs.
The question isn’t whether AI agents will change how product leaders work. It’s whether you’ll design those systems yourself or let someone else define the methodology.
Want to build your own AI chief of staff? Start with one agent that handles one workflow autonomously. Master the autoresearch pattern. And always flag the difference between what you’ve measured and what you’ve inferred — your future self will thank you for the intellectual honesty.
Photo by CRYSTALWEED cannabis on Unsplash

