The Day AI Emailed the Wrong Dan
This is the fourth in my running series on what building a small business with AI in the loop is actually like. The earlier ones covered Google indexing, forecasting, and the time it nearly published our subscriber list. This one is smaller and more mundane, which is exactly why it is worth writing about. Sending email is the most boring task you can hand an AI, and it is the one that bit us most often.
What I set out to do
We run daily social posts for one of the apps during a big tournament. Match graphics, results, fixtures, captions. The routine is the same every day, so I wanted the assistant to build the images and then just email them to me ready to post. Draft the email, attach the files, send. A two minute job I do not want to think about.
For the most part it was brilliant at this. It built the graphics, wrote the captions, attached the right files, and sent them over. Day after day, no fuss. The boring task got boring, which is the goal.
Where it went wrong (this is the point)
Two things went wrong, and they are both worth understanding because they are not really about email. They are about how these tools handle state and ambiguity.
The shared cache
Under the hood, the send worked like this. The AI writes the email it wants to send into a single file on disk, then a little script reads that file and actually sends it. Simple enough. The problem is that the file is shared. It is one fixed location, reused every time, and other sessions and tools can write to it too.
So you have a single mailbox-shaped file sitting there between "decide to send" and "actually send." If anything stale is in it, a draft from earlier, a half-written message, something another process dropped there, that is what goes out. The AI is not sending what you just approved. It is sending whatever happens to be in the cache at the moment the script runs. Most of the time that is your message. The day it is not is the day you have a problem.
This is the bit I think a lot of people miss with AI automation. The model can be perfectly correct and still send the wrong thing, because the model and the send are two separate steps with a shared scratchpad in between. The intelligence is not the weak link. The plumbing is.
The wrong Dan
The second one is more human. I have more than one email address, and confusingly for a robot, more than one of them is "Danny." There is my main agency address and a separate personal one. When I ask out loud, by voice, to "send it to Danny at the personal one," the transcription mangles the name, the AI has two plausible matches, and it has to pick.
It picked wrong. More than once it defaulted to whichever address it had seen first rather than stopping to ask which Danny I meant. On the day it really mattered I had to tell it, in fairly direct language, to send to the right inbox. Both addresses were mine, so the damage was contained to my own annoyance. Swap one of those for a client, or the wrong half of a married couple we work with, and "the AI guessed the recipient" stops being funny very quickly.
It is the same pattern from the rest of this series. Faced with ambiguity, the AI reaches for a confident default instead of asking a one line question. With Google it stated things confidently and was wrong. With forecasting it grabbed the plausible number. Here it grabbed the plausible recipient. The failure mode travels.
Is this a common problem?
Yes, and I think it is going to get more common, not less. Every "let my AI handle my inbox" setup has these two soft spots. There is almost always some shared state between composing and sending, and there is almost always a name to resolve into an address. Both are exactly the kind of thing that works in the demo and fails on a Tuesday when the cache is dirty or the name is ambiguous. Email is uniquely unforgiving here because it is instant, it is outward facing, and you cannot pull it back.
What actually fixed it
None of the fixes were clever. They were the boring controls you would put on any system that can do something irreversible.
- Draft first, confirm every single send. The AI shows me the recipient, subject, and body, and waits. No message leaves without a fresh yes against what is actually on screen. A yes from five minutes ago does not count.
- Write the message fresh the instant before sending, then delete it. Never trust whatever is sitting in the shared file. Write your payload, send it, wipe it. Treat the cache as hostile, because anything could have touched it.
- An allowlist that fails closed. Sends are blocked unless the recipient is on a short approved list. The first time it tried to reach a new address, it refused and made me explicitly add it. A gate that says no by default is worth ten that try to be helpful.
- Confirm the recipient whenever there is any doubt. Especially with voice input. If two contacts could match what I said, the right move is a one line question, not a guess.
The telling part, again, is that once those rules were in place the AI followed them perfectly. It is not that the tool cannot be safe. It is that safe is not its default, so you have to build the rails yourself.
What I would tell you
- The model is rarely the risk. The plumbing is. A correct AI plus a shared cache can still send the wrong thing. Look at the steps between deciding and doing, and assume the gap will leak.
- Never let AI send to an address it chose. Outward facing actions with a named recipient need a human confirming the actual recipient, every time. Allowlist them and fail closed.
- Voice makes it worse. Dictation mangles names, and the AI will resolve the mangled version into a confident guess. Make it ask.
- Boring controls beat clever ones. Draft first, write fresh, delete after, confirm always. None of it is impressive. All of it is what stops a wrong email going to a real client.
AI genuinely saved me time on the daily grind here, and I would not hand that routine back. But "it can send email" and "I trust it to send email unsupervised" are two very different statements, and the distance between them is made of exactly these small, dull guardrails.
If you are wiring AI into anything that talks to your customers and you want someone who has done this for twenty years to sanity check it before it goes live, get in touch. The quote generator is the quickest way to get a feel for how we work, and the work page shows what we have built.