It’s 11:40 p.m. and the laptop fan is the only sound in the kitchen. She pastes the last batch of prayer requests into the new tool, adds a short prompt about warmth and brevity, and watches three drafted replies appear. They read clean enough to send.
She queues them for morning, shuts the screen, and heads to bed. The system handled the volume; the rest, she figures, is just logistics.
By Thursday the first reply had already reached someone whose request was resolved weeks ago, and another landed with a small-group suggestion that made no relational sense. The AI had done exactly what it was asked. It just had no way to know what it was missing.
Solomon’s judgment supplies the missing lens. When two women claimed the same child, Solomon did not optimize for speed or volume. He forced the real criterion into view: which outcome would actually preserve life. The sword test revealed the difference between a claim that sounded right and the relationship that mattered. Faith-tech pilots skip this step when they treat the first generated message as the deliverable.
The Clean Output That Breaks on Contact
The first agent response always flatters the builder. Sentences land in the right tone. The structure looks complete. Yet the output contains no trace of the ministry’s actual decision rules about who owns which relationship or when a request moves from one leader to another.
I watched this play out with a mid-size church that fed six months of pastoral care notes into an early agent. The drafts looked pastoral. Two of the first five messages went to the wrong volunteer because the model had no way to know that one leader had stepped back for health reasons three weeks earlier. The data existed in a different system. The prompt never asked for it.
The surface match creates the illusion of progress. Real ministry work depends on context that lives outside the text the agent sees. Without an explicit rule for surfacing that context, the output routes people to the wrong place while still sounding correct.
The Pre-PMF Gate I Now Require
Before any pilot expands past the single volunteer who built it, I ask for one artifact. A single sentence that states the exact outcome the ministry owner will measure after the tool runs. Not accuracy. Not time saved. The concrete result for the person on the other end of the message.
Most teams resist this step because it feels slow. They have already seen the agent produce something usable in a demo. The sentence forces them to name what changes in the real workflow once the output leaves the screen. If the owner cannot edit the sentence into something they can verify in their own system, the pilot stops.
This requirement comes from watching too many promising workflows collapse at the first handoff. The sentence acts as the test Solomon applied. It reveals whether the automation targets the relationship that actually needs preserving or merely the text that happens to be easy to generate.
Which Problems Become Tools and Which Stay Human
Once the judgment step is explicit, the list of candidate automations shrinks. Tasks that only require matching visible data to a template survive. Tasks that require knowing which relationships have changed in the last month or which requests were already closed do not.
The prayer-request database that became a living record stayed human at the point of assignment. The agent now drafts language and surfaces possible matches, but the final owner still confirms the routing against the current small-group map. The judgment step sits with the ministry owner, not the model.
This split changes how teams staff pilots. Instead of handing a volunteer a new workflow and measuring output volume, the owner first writes the one-sentence definition of success. Only then does prompt work begin. The result is fewer tools that reach production, but those that do survive first contact with real people.
Your Turn: Apply This Today
- Take the next AI pilot idea you have written down and reduce it to one sentence that names the exact outcome a ministry owner will verify after the tool runs.
- Send that sentence to the actual ministry owner who would receive the output and ask them to edit it in their own words before any prompt is written.
- If the owner cannot name a verifiable result in one sentence, archive the pilot idea for thirty days and pick a narrower problem.
- Document the edited sentence in the same place you store the prompt history so the success criterion stays visible when the first clean output appears.
- Run the pilot only with the single volunteer who owns that sentence, and stop expansion until they confirm the outcome matched the definition on their own system.
- Repeat the one-sentence definition step for every new workflow before any code or prompt work begins this quarter.
The same pattern showed up in how the living prayer-request database evolved and in the decision to embed agents rather than build separate tools. Both cases required the judgment step before any automation moved forward.
I consult with product leaders and ministry owners on defining success criteria for AI pilots and embedding agents into existing workflows. Let’s talk.

