Replaced a Bookkeeper With AI: 9-Month Production Log (2026)

May 5, 2026 · 9 min read · by Dmytro Negodiuk

Nine months ago I was paying a bookkeeper $650 a month to keep five small businesses reconciled. Good person, did the work, but every month I was paying to move numbers between systems that already had APIs.

I decided to test an AI replacement. This post is the honest 9-month log. Savings are real. Pain points are also real. I still have a CPA because I am not an idiot. Read to the end before you tell your bookkeeper anything.

What the bookkeeper was doing

Five-business bookkeeping across: Amazon Seller (Mozabrik), Shopify (OD Granite), Stripe (Negodiuk.ai consulting retainers), PayPal (miscellaneous international), and a Notion-based consulting tracker.

Monthly output:

Time it took her: about 8-10 hours a month. Fee: $650 a month. Equivalent hourly: $65-$80, which is the going rate for a competent US bookkeeper on a small portfolio.

The AI replacement (what I actually built)

Not "a single AI tool." It is a small stack of automations that produce 90% of what the bookkeeper produced, plus a human (me) who handles the 10% that matters most.

Component 1: Transaction pull

A daily n8n cron pulls transactions from:

All raw transactions land in a single Notion database with a consistent schema.

Component 2: Categorization via Claude

Each night, Claude reads uncategorized rows and assigns a category based on merchant name, memo, and amount. Uses a 40-category taxonomy I built once. Confidence score per categorization. Anything under 85% confidence gets flagged to a "review" view.

Categorization accuracy after 3 months of prompt tuning: about 94% on recurring transactions, 72% on novel ones. Human review takes 5 minutes a week, not 5 hours a month.

Component 3: Reconciliation

Simple Python script compares bank statement deposits to Stripe / Amazon / Shopify / PayPal settlement totals. Flags any mismatch over $5. Most mismatches are timing differences (deposit was Nov 30, Amazon settlement was Nov 28). Flags ones that stay unresolved for 14+ days as "investigate."

Component 4: Monthly report

End of month, n8n runs a summary and drops a Telegram message:

Mar 2026 consolidated:
• Revenue: $X (breakdown per business)
• Expenses: $Y (top 10 categories)
• Net: $Z
• Unusual: 2 transactions over $1K not recurring, review needed
• All accounts reconciled

That's it. Monthly close that used to take the bookkeeper 8-10 hours is now a 30-second Telegram notification plus maybe 15 minutes of my time reviewing flagged items.

What it actually costs

ItemMonthly
Claude API for categorization~$15
Plaid (free for my transaction volume)$0
n8n self-hosted on $8 VPS$8 (shared across 50+ other automations)
Notion (paid plan for database)$10
My time, 15-30 min/week on review~1-2 hours/mo
Out-of-pocket~$35

Before: $650/month bookkeeper.
After: $35/month stack + 1-2 hours of my time.
Monthly saving: $615. Over 9 months: $5,535.

What broke (and the fixes)

Break 1 (month 2): Amazon settlement complexity

Amazon settlements are not simple. They include sales, fees, reimbursements, FBA storage fees, tax withholdings, and chargebacks. My first categorization pass lumped them all as "Amazon revenue" which inflated gross revenue by ~30%.

Fix: Built a parser that splits settlements by line item before categorization. Extra 2 hours of setup. Permanent fix.

Break 2 (month 3): PayPal edge cases

Currency conversion on international PayPal payments showed up as two transactions (FX fee + actual transfer). Claude categorized the FX fee as "miscellaneous" and lost $3 a month.

Fix: Added a rule specifically for "PayPal FX fee" category. 10 minutes.

Break 3 (month 5): Owner's personal card charged to business

Month where I accidentally put a personal dentist charge on the business card. Claude categorized it as "Medical, employee benefit." My tax person noticed. Had to reclassify.

Fix: Added weekly human review step. Claude can't catch personal-vs-business when the merchant is ambiguous. Human skim takes 5 min/week.

Break 4 (month 7): Claude API format change

Anthropic rolled a minor output format change. My parser broke for one day. Categorizations stopped.

Fix: Added a fallback to raw text parse, plus error alert via Telegram. Took an hour. Hasn't broken since.

What I still use humans for

Important: I replaced my bookkeeper. I did NOT replace my CPA. Those are different roles.

When AI bookkeeping will bite you

Who should actually try this

How to start (if you want to)

  1. Keep your current bookkeeper for 1 month while you set up the AI stack in parallel.
  2. Reconcile against their output. Goal: 95%+ match.
  3. If match is good, switch to the AI stack for 1 month WITH weekly human review. Keep the bookkeeper as consultant on speed dial.
  4. If month 2 still matches, cancel the bookkeeper. Keep the CPA.

Budget 20-30 hours of setup time. Or hire someone like me to build it for you. See below.

Want this built for your business?

I set up this exact stack for $1-10M SMBs as part of the Fractional AI Officer Sprint ($5,000-$8,000). Turnaround: 3-4 weeks. Includes the stack plus a 2-hour handoff training. Book a 30-min call to see if it fits.

Book a call

More on the Fractional AI Officer model · The $600 operator stack · Fractional AI Officer for distributors