# OP-DEXT-B: Invoice Backup & Recovery Verification
## Concept Specification — SharePoint Durability Hardening

---

## Goal

Verify, harden, and prove that every invoice processed through the AR Controls Finance Centre is durably backed up to SharePoint, and can be fully recovered from SharePoint alone after a server wipe — meeting the HMRC 6-year invoice retention obligation.

---

## Why It Matters

In a previous incident, a GitHub repository cleanup wiped locally-stored invoice files from the ERP server. The system recovered, but only through manual intervention. That fragility persists today: the backup chain has never been tested end-to-end, SharePoint upload failures are silently swallowed, and there is no UI feedback showing whether a given invoice is actually backed up. For a business targeting HMRC compliance and the eventual elimination of a manual admin-clerk role, silent data loss is an existential risk.

**Three concrete risks active today:**

1. **DigitalOcean Spaces bypass**: When `DO_SPACES_KEY/SECRET` are configured, invoice PDFs route to DO Spaces — the SharePoint per-ingest backup is not called at all. A DO Spaces outage or billing lapse loses PDFs permanently.
2. **Fire-and-forget SharePoint upload**: When local storage is used, the SharePoint backup is an `asyncio.create_task()` with no retry, no status flag on the receipt document, and no alert path. A network blip silently skips the backup.
3. **MongoDB backup ≠ PDF backup**: The daily encrypted backup in `Shared Documents/ERP Backups/` covers MongoDB metadata only — not the actual PDF/image files. Restoring the DB without the files produces receipt records pointing at missing attachments.

---

## Target Users

| User | Role | Primary Concern |
|------|------|-----------------|
| Andrew Ramsey | Owner / Admin | Compliance, data durability, no surprise data loss |
| Accountants / Finance Staff | Receipts reviewers | Confidence that approved invoices are preserved |
| Future auditors / HMRC | External | 6-year retention provable on demand |

---

## Primary Use Cases

1. **Invoice ingested via email or manual upload** → backed up to SharePoint automatically, status visible on receipt record
2. **Admin checks backup health** → Finance Centre Settings shows last backup time, SharePoint connectivity status, any failed per-ingest backups
3. **Server disaster** → admin runs restore from SharePoint, invoices reappear with correct IDs, amounts, and PDF attachments
4. **HMRC audit** → admin can demonstrate that every invoice from the past 6 years exists in SharePoint with metadata trail

---

## Success Criteria (Testable)

- [ ] **SC-1 — Coverage proof**: Code audit confirms the SharePoint per-ingest backup fires for every ingest path: manual upload (`POST /api/receipts/upload-receipts`) AND email ingest (`email_ingest.py`), regardless of whether DO_SPACES is configured.
- [ ] **SC-2 — Status persisted**: After ingest, the receipt MongoDB document has a `sharepoint_backup_status` field with values `pending | success | failed` and `sharepoint_backup_at` timestamp.
- [ ] **SC-3 — Live trace**: A real test ingest on DEV produces a backend log entry confirming the SharePoint upload fired for that specific file, and the receipt record shows `sharepoint_backup_status: success`.
- [ ] **SC-4 — Wipe+recover drill**: On DEV, local invoice files in `backend/uploads/receipts/` and `storage/email_attachments/` are deleted; restore from SharePoint returns all files intact; Finance Centre renders the receipt with its PDF/image visible.
- [ ] **SC-5 — No silent failure**: If SharePoint upload fails during ingest, the receipt document is marked `sharepoint_backup_status: failed` and the failure is logged at ERROR level (not WARNING). A monitoring hook or admin panel surfacing failed backups is in scope.
- [ ] **SC-6 — DO_SPACES path covered**: When DigitalOcean Spaces is configured and the primary upload succeeds, a SharePoint shadow copy is still written (or an equivalent durable second copy exists and is documented as such in VERIFICATION.md).
- [ ] **SC-7 — DB backup integrity**: The daily MongoDB backup includes a manifest of all receipt document IDs with their `stored_filename` values, so a restore can identify which PDFs need to be pulled from SharePoint.
- [ ] **SC-8 — UI reflects state**: Finance Centre Settings page shows: SharePoint connectivity (green/red), count of receipts with `sharepoint_backup_status: failed`, last successful per-ingest backup time.
- [ ] **SC-9 — Restore drill documented**: `evidence/OP-DEXT-B/restore-drill-transcript.md` written with timestamps, exact commands, and results at every step.
- [ ] **SC-10 — Runbook complete**: `brain/runbooks/dext-recovery.md` exists with steps any admin can follow to recover from a full server wipe (separate task, but spec'd here).

---

## Key User Flows

### Flow 1 — Normal Invoice Ingest (Email or Upload)

1. Invoice PDF arrives in `invoicing@ar-controls.co.uk` OR admin uploads via Finance Centre UI
2. Ingest service saves file locally (and to DO Spaces if configured)
3. **[NEW]** Storage service immediately schedules SharePoint upload for `Shared Documents/ERP Receipts/{YYYY}/{MM}/{filename}`
4. SharePoint upload completes; receipt document updated: `sharepoint_backup_status: success`, `sharepoint_backup_at: <timestamp>`
5. OCR runs, rules engine fires, receipt enters approval queue
6. Admin views receipt detail — backup badge shows "✓ SharePoint" with upload timestamp
7. Receipt is approved, published to Sage CSV

### Flow 2 — Backup Health Check (Admin)

1. Admin navigates to Finance Centre → Settings → Backup & Recovery tab
2. Page loads backup status panel:
   - SharePoint connectivity indicator (green/red, last checked)
   - Count of receipts pending backup / failed backup
   - Last successful per-ingest backup
   - Last daily MongoDB backup
3. Admin drills into "Failed backups" list, sees which receipts have `sharepoint_backup_status: failed`
4. Admin triggers re-upload for failed receipts (or system auto-retries on next poll)

### Flow 3 — Disaster Recovery (Server Wipe)

1. Admin notices server is clean (fresh VPS or accidental wipe)
2. Admin follows runbook: deploy ERP, point at MongoDB backup, run restore
3. **Step A — DB restore**: Admin uploads latest `.json.enc` backup via `POST /api/backup/restore`, selects affected collections, confirms password
4. System re-creates MongoDB documents; receipt records appear but PDF links are broken
5. **Step B — PDF restore**: Admin triggers `POST /api/receipts/restore-from-sharepoint` (new endpoint); service walks receipt collection, finds all `stored_filename` values, downloads each from `Shared Documents/ERP Receipts/`
6. Files written back to `backend/uploads/receipts/`; Finance Centre renders invoices normally
7. Admin screenshots Finance Centre showing restored receipts — visual evidence for VERIFICATION.md
8. `.orch-done.json` written after both DB and PDF restore confirmed

### Flow 4 — HMRC Evidence Pull

1. Admin navigates to Finance Centre → Settings → Backup & Recovery → SharePoint Explorer
2. Selects date range (e.g. entire tax year)
3. System queries SharePoint `Shared Documents/ERP Receipts/` folder tree and returns manifest
4. Admin can download individual invoices or a ZIP for inspection
5. Each file is the original, unmodified PDF/image from ingest

---

## Technical Approach

### Storage Chain (as-built, gaps highlighted)

```
Invoice Arrives
      │
      ├─ [Path A: DO_SPACES configured]
      │    └─ upload_file() → boto3 → DigitalOcean Spaces ✓
      │         └─ ⚠ NO SharePoint shadow copy today
      │
      └─ [Path B: Local fallback]
           └─ _upload_local() → backend/uploads/receipts/ ✓
                └─ asyncio.create_task(_backup())
                     └─ SharePointService.upload_receipt() → ERP Receipts/{YY}/{MM}/ ✓
                          └─ ⚠ Fire-and-forget, no status written to DB
```

### Changes Required (this concept)

1. **`storage_service.py` — plug DO_SPACES gap**: After a successful Spaces upload, schedule the same `asyncio.create_task(_backup())` that the local path uses. SharePoint becomes a true shadow copy regardless of primary storage.

2. **`storage_service.py` / `receipts.py` — write backup status**: After `_backup()` resolves, update the receipt document with `sharepoint_backup_status` and `sharepoint_backup_at`. This requires the receipt's MongoDB `_id` to be passed into the backup coroutine.

3. **`sharepoint_service.py` — add `download_receipt()`**: Already partially implemented as `fetch_receipt()` (line ~988). Confirm it works, add error surfacing.

4. **`receipts.py` — add `POST /api/receipts/restore-from-sharepoint`**: New admin endpoint. Walks receipts collection, for each receipt with a missing local file, downloads from SharePoint and writes it back to disk.

5. **`backup_scheduler.py` — add PDF manifest to daily backup**: Include `receipts` collection in backup with `stored_filename` field so restore knows which files to pull.

6. **`ReceiptsSettings.jsx` — Backup & Recovery tab**: New tab in Finance Centre Settings showing backup stats, failed-backup list, manual re-trigger button, and restore-from-SharePoint wizard.

### Data Model — Receipt Document (additions)

```js
{
  // ...existing fields...
  sharepoint_backup_status: "pending" | "success" | "failed",  // NEW
  sharepoint_backup_at: ISODate,                               // NEW
  sharepoint_backup_path: "Shared Documents/ERP Receipts/...", // NEW
  sharepoint_backup_error: "Error message if failed"           // NEW
}
```

### Frameworks & Components

| Layer | Technology |
|-------|-----------|
| Backend API | FastAPI (existing) |
| SharePoint client | `msal` + Microsoft Graph REST (existing `SharePointService`) |
| PDF storage | Local FS + DigitalOcean Spaces (S3) + SharePoint shadow |
| DB | MongoDB via Motor (async) |
| Frontend | React + existing ReceiptsSettings.jsx pattern |
| Backup encryption | Fernet (existing `utils/encryption.py`) |

---

## File Targets

### Modify (existing files)

| File | Change |
|------|--------|
| `backend/services/storage_service.py` | Plug DO_SPACES gap; write backup status back to receipt document |
| `backend/api/receipts.py` | Add `POST /api/receipts/restore-from-sharepoint`; surface `sharepoint_backup_status` in receipt list/detail responses |
| `backend/services/sharepoint_service.py` | Validate/fix `upload_receipt()` and `fetch_receipt()` error paths; add structured return including `sharepoint_path` |
| `frontend/src/pages/receipts/ReceiptsSettings.jsx` | Add "Backup & Recovery" tab with status panel and restore wizard |
| `frontend/src/components/receipts/ReceiptDetail.jsx` (or `ReceiptForm.jsx`) | Add SharePoint backup status badge to individual receipt view |

### Create (new files)

| File | Purpose |
|------|---------|
| `evidence/OP-DEXT-B/VERIFICATION.md` | Master evidence document for this slice |
| `evidence/OP-DEXT-B/restore-drill-transcript.md` | Step-by-step restore drill log |
| `evidence/OP-DEXT-B/screenshots/` | UI screenshots post-restore |
| `brain/runbooks/dext-recovery.md` | Tested recovery runbook (separate slice) |

---

## Open Questions for Andrew

1. **SharePoint tenant in DEV**: Are `SHAREPOINT_TENANT_ID`, `SHAREPOINT_CLIENT_ID`, `SHAREPOINT_CLIENT_SECRET` populated in the DEV `.env` file at `100.68.36.49:/root/AR-ERP-dev/.env`? If not, the live-trace and wipe+recover drill can only test the code path, not real cloud durability — and VERIFICATION.md will say so.

2. **DO_SPACES in DEV**: Is `DO_SPACES_KEY` configured on DEV? If yes, invoices route to DigitalOcean Spaces (not local), and the per-ingest SharePoint backup doesn't fire today — the gap in SC-6 is live in the DEV environment.

3. **Backup priority: ingest-time vs. nightly**? Should every invoice be backed up immediately on ingest (adds ~200–500ms latency for SharePoint API call, or async to avoid latency), or is nightly acceptable given the daily MongoDB backup already runs? Andrew said *"we can't lose a lot of data"* — this suggests ingest-time is preferred, but confirm.

4. **Who can trigger restore?** The existing restore endpoint requires admin password. Should the `restore-from-sharepoint` endpoint follow the same pattern, or require a separate confirmation step (e.g. typed keyword + password)?

5. **DO_SPACES as primary vs. secondary**: Is DigitalOcean Spaces the intended primary file store (with SharePoint as DR backup), or should SharePoint be the primary with DO Spaces as secondary? This affects which one the restore procedure targets first.

6. **HMRC evidence format**: When HMRC asks for invoices, is a SharePoint folder link sufficient, or does Andrew need a packaged ZIP with a manifest CSV? (Affects scope of the SharePoint Explorer UI.)

7. **Retention policy**: SharePoint `Shared Documents/ERP Receipts/` — is there any SharePoint retention policy applied? Six-year retention requires the files not to be auto-deleted by SharePoint policies or admin housekeeping.
