Automates invoice intake from Gmail, extracts data via OCR, verifies payment in Stripe, and creates reconciliation-ready accounting entries in Xero.
---
name: autonomous-bookkeeper
description: Meta-skill for pre-accounting automation by orchestrating gmail, deepread-ocr, stripe-api, and xero. Use when users need invoice intake from email, structured field extraction, payment verification, and accounting entry creation with reconciliation-ready status.
homepage: https://clawhub.ai
user-invocable: true
disable-model-invocation: false
metadata: {"openclaw":{"emoji":"ledger","requires":{"bins":["python3","npx"],"env":["MATON_API_KEY","DEEPREAD_API_KEY"],"config":[]},"note":"Requires local installation of gmail, deepread-ocr, stripe-api, and xero."}}
---
# Purpose
Automate preparatory bookkeeping from incoming email to accounting records.
Core objective:
1. detect invoice email,
2. extract structured invoice data,
3. verify payment event,
4. create accounting entry and reconciliation status.
This is orchestration logic across upstream tools; it is not a replacement for financial controls.
# Required Installed Skills
- `gmail` (inspected latest: `1.0.6`)
- `deepread-ocr` (inspected latest: `1.0.6`)
- `stripe-api` (inspected latest: `1.0.8`)
- `xero` (inspected latest: `1.0.4`)
Install/update:
```bash
npx -y clawhub@latest install gmail
npx -y clawhub@latest install deepread-ocr
npx -y clawhub@latest install stripe-api
npx -y clawhub@latest install xero
npx -y clawhub@latest update --all
```
# Required Credentials
- `MATON_API_KEY` (for Gmail, Stripe, Xero through Maton gateway)
- `DEEPREAD_API_KEY` (for OCR extraction)
Preflight:
```bash
echo "$MATON_API_KEY" | wc -c
echo "$DEEPREAD_API_KEY" | wc -c
```
If missing, stop before any bookkeeping action.
# Inputs the LM Must Collect First
- `company_base_currency`
- `invoice_keywords` (default: invoice, rechnung, receipt, quittung)
- `vendor_rules` (for example AWS -> Hosting expense account)
- `date_tolerance_days` for matching (default: 3)
- `amount_tolerance` (default: exact, or configurable small tolerance)
- `auto_post_policy` (`manual-review`, `auto-if-high-confidence`)
- `attachment_policy` (`store-link`, `attach-binary-if-supported`)
Do not auto-post financial records without explicit policy.
# Tool Responsibilities
## Gmail (`gmail`)
Use for intake and attachment discovery.
Relevant behavior:
- query messages with Gmail operators (for example `has:attachment`, `subject:invoice`, sender filters)
- fetch message metadata and full payload for parsing
- label/update messages after processing (for traceability)
## DeepRead OCR (`deepread-ocr`)
Use for extracting structured fields from invoice PDFs/images.
Relevant behavior:
- async processing (`queued` -> `completed`/`failed`)
- schema-driven extraction
- field-level `hil_flag` and reason for uncertainty
- webhook or polling modes
## Stripe (`stripe-api`)
Use for payment-side verification.
Relevant behavior:
- query charges/payment_intents/invoices/balance transactions
- verify amount, currency, status, and date proximity
## Xero (`xero`)
Use for accounting record creation and payment/reconciliation visibility.
Relevant behavior:
- create contacts if missing
- create invoices/bills (`ACCPAY` for payable bills)
- list payments and bank transactions
# Canonical Signal Chain
## Stage 1: Inbox detection
Scan Gmail for candidate invoice emails.
Recommended query pattern:
- `has:attachment (subject:invoice OR subject:rechnung OR subject:receipt OR subject:quittung)`
- optional sender constraint for known vendors (for example `from:aws`)
Output:
- message ID
- sender
- received date
- attachment candidates
## Stage 2: Attachment extraction
For each invoice candidate attachment:
1. send file to DeepRead OCR with invoice schema
2. wait for async completion (webhook preferred; polling fallback)
3. parse structured result
Minimum extracted fields:
- vendor
- invoice_date
- invoice_number
- total_amount
- tax_amount
- currency
Quality gate:
- if critical fields have `hil_flag=true`, route to review queue before posting.
## Stage 3: Payment verification
Use Stripe to check whether corresponding payment occurred.
Matching policy:
- amount equals invoice total (within tolerance)
- currency matches
- date within tolerance window
- status is successful/paid
If multiple candidates match, mark as `ambiguous_match` and require review.
## Stage 4: Accounting write
Use Xero for booking.
Default payable flow:
1. ensure vendor contact exists (create if needed)
2. create bill entry (`Type: ACCPAY`) with line item category (for example Hosting)
3. mark as paid/reconciled state only when Stripe verification is confident
4. include reference fields: invoice number, source message ID, payment reference
Attachment handling:
- if binary attachment endpoint/path is available in the active integration, attach file
- otherwise store durable file reference and include link/reference in description/metadata
## Stage 5: Traceability updates
After successful processing:
- apply Gmail processed label
- store processing log (source email, extraction confidence, matching evidence, xero IDs)
- keep idempotency key to avoid duplicate posting
# Scenario Mapping (AWS Invoice)
For the scenario "AWS invoice by email -> Xero + card match":
1. Gmail finds AWS email with PDF attachment.
2. DeepRead OCR extracts structured fields (vendor/date/total/tax/invoice number).
3. Stripe check confirms payment event around invoice date and amount.
4. Xero creates payable entry (`ACCPAY`) under Hosting category.
5. Record is marked paid only after confident match; source PDF linked/attached per policy.
# Data Contract
Normalize to one transaction record before posting:
```json
{
"source": {
"gmail_message_id": "...",
"sender": "billing@aws.amazon.com",
"attachment_name": "invoice.pdf"
},
"invoice": {
"vendor": "AWS",
"invoice_number": "INV-123",
"invoice_date": "2024-05-01",
"total": 53.20,
"tax": 0.00,
"currency": "USD",
"ocr_confidence_ok": true
},
"payment_match": {
"provider": "stripe",
"matched": true,
"transaction_id": "ch_...",
"amount": 53.20,
"date": "2024-05-01"
},
"accounting": {
"system": "xero",
"entry_type": "ACCPAY",
"category": "Hosting",
"status": "Paid"
}
}
```
# Output Contract
Always return:
- `IntakeSummary`
- emails scanned, invoice candidates found
- `ExtractionSummary`
- extracted fields and `hil_flag` status
- `PaymentVerification`
- matched/not matched + evidence
- `AccountingAction`
- created/updated records and IDs
- `ReviewQueue`
- any records requiring manual validation
# Quality Gates
Before auto-posting:
- vendor identified
- invoice number/date/total present
- no critical `hil_flag` unresolved
- payment match confidence above policy threshold
- duplicate check passed (same vendor + invoice number + total)
If any gate fails, return `Needs Review` and do not auto-post.
# Guardrails
- Never mark invoice as paid without payment evidence.
- Never silently overwrite existing accounting records.
- Never drop uncertain OCR fields; surface them explicitly.
- Prefer manual review when amount/date ambiguity exists.
- Preserve source audit trail for every booking action.
# Failure Handling
- Gmail unavailable: stop intake and report connection issue.
- OCR job failed/timeout: keep email queued for retry.
- Stripe no match: post as unpaid bill or route to review per policy.
- Xero write failed: keep normalized record and retry safely with idempotency key.
# Known Limits from Inspected Upstream Skills
- DeepRead OCR is asynchronous and may require webhook/polling orchestration.
- The inspected Xero skill docs emphasize core accounting endpoints but do not fully document attachment upload flow; attachment behavior depends on supported endpoint path in active integration.
- Stripe/Xero matching is orchestration logic here, not a single native "auto-reconcile" endpoint in these inspected skill docs.
- QuickBooks is not part of this researched stack; this meta-skill is Xero-first.
Treat these limits as mandatory operator disclosures.
don't have the plugin yet? install it then click "run inline in claude" again.
added explicit decision trees for ocr uncertainties, payment matching, and auto-post policies; documented all external connections and edge cases (rate limits, timeouts, duplicates, attachment unsupported); clarified quality gates and guardrails as mandatory stops before posting; added structured output contract with normalized json schema and clear outcome signals for operator verification.
automates the full bookkeeping intake pipeline: finds invoice emails in gmail, extracts structured data via ocr, verifies the corresponding payment in stripe, and posts accounting entries to xero with reconciliation status. use this when you need to turn incoming invoices into auditable accounting records without manual data entry. this is orchestration across upstream tools, not a replacement for financial controls.
MATON_API_KEY (env var): gateway token for gmail, stripe, xero access. test with echo "$MATON_API_KEY" | wc -c before running. stop if missing.DEEPREAD_API_KEY (env var): ocr extraction service token. test with echo "$DEEPREAD_API_KEY" | wc -c. stop if missing.install via npm if not present:
npx -y clawhub@latest install gmail deepread-ocr stripe-api xero
npx -y clawhub@latest update --all
verified minimum versions: gmail 1.0.6, deepread-ocr 1.0.6, stripe-api 1.0.8, xero 1.0.4.
company_base_currency (string): e.g. "USD", "EUR". required.invoice_keywords (array): search terms for inbox detection. default: ["invoice", "rechnung", "receipt", "quittung"].vendor_rules (object): vendor name to expense account mapping. e.g. {"AWS": "Hosting", "Twilio": "Communications"}.date_tolerance_days (integer): invoice date to payment date window. default: 3.amount_tolerance (string or float): "exact" or e.g. 0.50 for +/- $0.50. default: "exact".auto_post_policy (enum): "manual-review" or "auto-if-high-confidence". required. do not default to auto-post.attachment_policy (enum): "store-link" or "attach-binary-if-supported". default: "store-link".inputs: invoice_keywords, optional sender constraints (e.g. "from:aws@amazon.com").
process:
has:attachment (subject:{keyword1} OR subject:{keyword2} ...) [optional: from:known_vendor]outputs: list of candidate messages with: message_id, sender, received_date, attachment_filename, attachment_id.
edge cases:
inputs: attachment payload (binary or uri), invoice schema (vendor, invoice_date, invoice_number, total_amount, tax_amount, currency).
process:
outputs: structured invoice record with: vendor, invoice_date, invoice_number, total_amount, tax_amount, currency, ocr_confidence_ok (boolean), hil_fields (list of uncertain fields with reasons).
edge cases:
inputs: invoice total_amount, currency, invoice_date, vendor name, amount_tolerance, date_tolerance_days.
process:
outputs: payment_match object with: matched (boolean), transaction_id (or null), amount, date, confidence_reason.
edge cases:
inputs: normalized transaction record (vendor, invoice_number, total_amount, company_base_currency).
process:
outputs: quality_gate_result (passed/failed), fail_reasons (list of gate failures).
edge cases:
inputs: normalized invoice record, vendor_rules, auto_post_policy, attachment_policy, xero idempotency_key.
process:
outputs: xero bill_id, contact_id, bill_status, attachment_reference.
edge cases:
inputs: message_id, xero_bill_id, processing metadata.
process:
outputs: label_applied (boolean), log_stored (boolean).
edge cases:
decision 1: critical ocr fields missing or high-uncertainty?
decision 2: payment found in stripe within tolerance and date window?
decision 3: duplicate bill already exists in xero?
decision 4: all quality gates passed?
decision 5: attachment binary available and attachment_policy="attach-binary-if-supported"?
always return a structured result object with the following sections:
all data normalized to the contract schema:
{
"source": {
"gmail_message_id": "string",
"sender": "string",
"attachment_name": "string",
"received_date": "iso8601"
},
"invoice": {
"vendor": "string",
"invoice_number": "string",
"invoice_date": "iso8601",
"total": "number",
"tax": "number",
"currency": "string",
"ocr_confidence_ok": "boolean",
"hil_fields": ["string"]
},
"payment_match": {
"provider": "stripe",
"matched": "boolean",
"transaction_id": "string or null",
"amount": "number or null",
"date": "iso8601 or null"
},
"accounting": {
"system": "xero",
"entry_type": "ACCPAY",
"category": "string",
"status": "Draft | Paid",
"bill_id": "string or null",
"contact_id": "string or null"
}
}
the skill succeeded if:
intake complete: emails_scanned > 0, invoice_candidates_found reflects expected result (zero if no invoices, >0 if invoices present).
extraction complete: extraction_summary.records_extracted matches expected count; hil_fields array is populated only for uncertain fields; no silent data loss.
payment verification complete: every extracted invoice has a verification_details entry (matched or unmatched); ambiguous_matches are flagged, not guessed.
accounting posted correctly:
no silent overwrites: if duplicate detected, existing bill_id is returned, not overwritten.
audit trail preserved: processing_log contains all idempotency_keys and source references; operator can trace any record back to original gmail message.
review queue populated correctly: any record with hil_flag=true, ocr_confidence_ok=false, payment_match=false, or quality gate failure is in review_queue. zero items in review_queue only if all records are high-confidence matches and auto-posted.
check logs for completion: if end_time is set and duration_seconds >= 0, the skill ran to completion (success or expected failure). if end_time is null or missing, the skill crashed or timed out.