# MailForge — Email Service Platform
### Deep Brainstorm · Architecture Design · API Blueprint
> **Date:** 2026-05-08 · **Status:** Brainstorm / Pre-Design Phase

---

## Table of Contents
1. [Vision & Goals](#1-vision--goals)
2. [Key Decisions](#2-key-decisions)
3. [Complete Feature List](#3-complete-feature-list)
4. [System Architecture](#4-system-architecture)
5. [Data Models](#5-data-models)
6. [API Endpoints](#6-api-endpoints)
7. [Go Backend Architecture](#7-go-backend-architecture)
8. [Provider Abstraction Layer](#8-provider-abstraction-layer)
9. [NATS JetStream Design](#9-nats-jetstream-design)
10. [Template Engine](#10-template-engine)
11. [Analytics Pipeline](#11-analytics-pipeline)
12. [Compliance & Deliverability](#12-compliance--deliverability)
13. [Security Architecture](#13-security-architecture)
14. [Admin UI Architecture](#14-admin-ui-architecture)
15. [Deployment Architecture](#15-deployment-architecture)
16. [Performance: 10k+ Emails/sec](#16-performance-10k-emailssec)
17. [Phased Rollout Plan](#17-phased-rollout-plan)
18. [Open Questions & Future Scope](#18-open-questions--future-scope)

---

## 1. Vision & Goals

**MailForge** is a fully independent, multi-tenant email delivery platform that any project (SimplifyHiring, and future products) can integrate via API. It is **not** a marketing SaaS product — it is a reliable, programmable infrastructure layer for sending every class of email at scale.

### Core Principles
| Principle | Description |
|---|---|
| **API-first** | Everything is exposed via REST API. The UI is just a consumer of the same API |
| **Multi-tenant isolation** | Each project has complete data isolation; one project cannot see or affect another |
| **Provider-agnostic** | Swap/add email providers without changing any integration code |
| **Observable by default** | Every email has a full lifecycle event trail, nothing is fire-and-forget |
| **Compliance-built-in** | GDPR, CAN-SPAM, CASL, DKIM/SPF/DMARC are first-class, not bolt-ons |
| **Horizontally scalable** | Workers scale independently; target 10,000+ emails/sec at steady state |

---

## 2. Key Decisions

| Concern | Decision |
|---|---|
| Language | **Go** (modular monolith, split into microservices at scale boundary) |
| HTTP Framework | **Chi** (stdlib-compatible, lightweight, composable middleware) |
| Message Queue | **NATS JetStream** (durable, exactly-once semantics, built-in retry) |
| Primary DB | **PostgreSQL 16** (pgx/v5 driver, pgx pool) |
| Cache / Rate-limit | **Redis 7** (token bucket, key lookup cache, idempotency store) |
| Object Storage | **MinIO** (attachments, template assets, export archives) |
| Template Engine | **Go `html/template`** (safe auto-escaping, zero dependencies) |
| Auth (UI) | **Zitadel** (existing infrastructure, OIDC/JWT) |
| Auth (API) | Rotating API keys with scopes + expiry |
| Providers | SES, SendGrid, Mailgun, Postmark, Resend, SMTP2GO, Custom SMTP |
| Deployment | Docker Compose (dev) → Coolify (production) |
| Multi-region | Active-passive DR (primary + hot-standby) |
| Tracing | OpenTelemetry → Jaeger |
| Metrics | Prometheus + Grafana |
| Logging | Structured JSON → Loki |

---

## 3. Complete Feature List

### 3.1 Project Management
- **CRUD Projects** — Create isolated namespaces for each product/team. Each project has a slug, display name, default From address, Reply-To, bounce address, and timezone.
- **Project Settings** — Per-project defaults: daily send limits, rate limits, max attachment size, allowed MIME types.
- **Project Suspension** — SuperAdmin can suspend a project (all sends blocked) with an audited reason.
- **Project Stats Dashboard** — Aggregate send volume, delivery rate, bounce rate, complaint rate, per project.
- **Project-level Audit Log** — Immutable append-only log of all changes to project settings, key operations, SMTP changes.

### 3.2 Template Management
- **CRUD Templates** — Each template belongs to a project; has a human-readable name, slug, description, category tag, and variable schema (JSON Schema).
- **Template Versioning** — Every edit creates a new immutable version. A template has one `published` version (used in sends), an optional `draft`, and unlimited `archived` versions.
- **Version Diff** — API returns a diff between any two versions (unified diff of HTML body).
- **Template Preview** — Render a template with sample variable values before publishing. Returns fully rendered HTML + plain-text.
- **Template Clone** — Copy a template (and its published version) to the same or a different project.
- **Default Templates** — Platform-level templates provided by SuperAdmin (e.g., OTP, welcome email). Projects can inherit or override them.
- **Plain-Text Fallback** — Each version stores both `html_body` and `text_body`. Plain-text is auto-generated from HTML via a stripping pipeline if not explicitly provided.
- **Variable Schema Validation** — At send time, payload variables are validated against the template's JSON Schema before the message is queued.
- **Template Categories** — Tags like `transactional`, `marketing`, `system`, `onboarding` to support filtering and policy enforcement.

### 3.3 API Key Management
- **Rotating Keys with Scopes** — Keys carry a scope list: `email:send`, `email:read`, `template:read`, `template:write`, `analytics:read`, `suppression:manage`, `webhook:manage`, `project:read`, `project:write`.
- **Key Lifecycle** — Create → Active → Rotated (new key issued, old key given a 24h grace period) → Revoked.
- **Expiry** — Configurable TTL per key (7d, 30d, 90d, 1y, or never).
- **Key Metadata** — Name/description, created_by, last_used_at, last_used_ip for auditing.
- **Key Prefix Display** — Stored as `sha256(key)` in DB; the full key shown **once** at creation time. Prefix `mf_proj_` format.
- **Rate Limiting per Key** — Each key can have an optional override on the project-level rate limit.

### 3.4 SMTP / Provider Configuration
- **Multi-provider per Project** — A project can configure multiple providers (e.g., Postmark for transactional, SES for bulk).
- **Provider Priority & Fallback** — Assign a primary and fallback provider. On transient failure of primary, automatically route to fallback.
- **Supported Providers:**
  - Amazon SES (via AWS SDK)
  - SendGrid (REST API)
  - Mailgun (REST API)
  - Postmark (REST API)
  - Resend (REST API)
  - SMTP2GO (REST API)
  - Custom SMTP (any host:port with TLS, STARTTLS, or plain)
- **Credential Encryption** — All SMTP credentials and API keys for providers are encrypted at rest using AES-256-GCM with a per-project KEK (Key Encryption Key) stored separately.
- **BYOK (Bring Your Own Key)** — Each project can supply its own Key Encryption Key (KEK) via the API instead of using the platform-managed master KEK. The project-supplied key is never stored; it is used only to wrap/unwrap the per-project DEK at the moment it is provided. Projects without a BYOK fall back to the platform master KEK automatically.
- **Test Connection** — Trigger a test send via a configured provider and report success/failure with SMTP diagnostics.
- **Provider Health** — Automatic circuit-breaker per provider. If failure rate exceeds threshold, traffic is shifted to fallback automatically.
- **Routing Rules** — Route by template category (e.g., `transactional` → Postmark, `marketing` → SES) or by domain pattern of recipient.

### 3.5 Email Sending
- **Single Send** — Send to one recipient with raw subject/body or template+variables.
- **Batch Send** — Send to up to 10,000 recipients in a single API call. Each recipient can have personalised variables. Server fans out internally.
- **Template Send** — Reference template by slug + version (or `latest-published`).
- **Per-Send SMTP Override** — Every send (single or batch) can optionally override the project-level SMTP provider using either: (a) a `smtp_config_id` referencing a saved config, or (b) inline SMTP credentials (`smtp_override` object in the request body) for truly one-off/dynamic senders. Inline credentials are validated, used for that send only, and never persisted.
- **CC / BCC** — Supported on single sends. Not recommended on batch sends (privacy); platform warns.
- **Attachments** — PDFs and images (JPEG, PNG, GIF, WebP). Stored in MinIO; referenced by upload URL or inline base64 (≤4MB).
- **Inline Images** — Embedded as CID attachments for HTML body.
- **Reply-To Override** — Per-send override of default Reply-To.
- **Custom Headers** — Pass arbitrary `X-*` headers.
- **Idempotency Key** — `Idempotency-Key` header ensures exactly-once delivery at the API layer (stored in Redis for 24h).
- **Priority Levels** — `critical` (bypass rate limits), `high`, `normal`, `low`. Low-priority messages are batched and rate-paced.
- **From Name/Address Override** — Per-send override (subject to project policy whitelist).
- **Tags** — Attach string tags to a send for later filtering in analytics.

### 3.6 Email Scheduling
- **Exact Datetime** — Schedule a send for a specific UTC timestamp.
- **Timezone-Aware** — "Send at 09:00 in the recipient's timezone" — platform resolves offset per recipient using their stored timezone or IP geo-fallback.
- **Recurring / Cron** — Define a cron expression + timezone for repeating sends (e.g., weekly digest).
- **Cancel Scheduled** — Cancel any pending scheduled send before it fires.
- **Schedule Preview** — List next N fire times for a recurring schedule.
- **Drip Sequences** — A named sequence of steps (email + delay). Enrol a contact; the scheduler fires each step at the configured interval.

### 3.7 Event Tracking & Analytics
- **Lifecycle Events:** `queued`, `dispatched`, `delivered`, `soft_bounce`, `hard_bounce`, `complaint`, `opened`, `clicked`, `unsubscribed`, `failed`.
- **Open Tracking** — 1×1 pixel injected at end of HTML body. Respects `DoNotTrack` header (flag event, don't record IP).
- **Click Tracking** — All links in HTML body rewritten to pass through `/t/c/:id`. Original URL and click metadata stored.
- **Per-Recipient Journey** — Full event timeline per individual email address per send.
- **Aggregate Analytics** — Per project, per template, per time bucket (hourly, daily, weekly, monthly): send volume, delivery rate, open rate, click rate, bounce rate, unsubscribe rate, complaint rate.
- **Heatmaps** — Hour-of-day × day-of-week send volume and engagement heatmaps.
- **A/B Testing** — Create up to 4 variants of a template for a single send. Platform randomly distributes recipients across variants. Winner determination by open-rate or click-rate after a configurable evaluation window.
- **Provider Comparison** — Side-by-side delivery metrics when multiple providers are used.
- **Export** — CSV / JSON export of analytics data, stored in MinIO and served as signed download URL.
- **Webhooks for Events** — All events can be forwarded to project-configured webhook endpoints.

### 3.8 Suppression & Unsubscribe Management
- **Global Suppression List per Project** — Emails on this list are silently skipped at dispatch time (no provider call, no cost).
- **Reason Tagging** — `hard_bounce`, `soft_bounce_threshold`, `complaint`, `manual`, `unsubscribe`.
- **Automatic Suppression** — Hard bounces and spam complaints from providers auto-add to suppression list.
- **Soft Bounce Threshold** — Configurable: after N soft bounces within M days, auto-suppress.
- **One-Click Unsubscribe** — `List-Unsubscribe` and `List-Unsubscribe-Post` RFC 8058 headers injected automatically.
- **Preference Center** — Hosted page (`/pref/:token`) where recipients manage their subscription preferences per project. Project can define subscription categories (e.g., "Product Updates", "Security Alerts").
- **Bulk Import/Export** — Import suppression list via CSV; export current list.
- **GDPR Erasure** — `DELETE /suppressions/:email` removes all personal data traces for that email address from logs and analytics (anonymised).

### 3.9 Inbound Webhook Processing
- **Provider Event Ingestion** — Dedicated inbound endpoints for each provider's event webhooks (SES SNS, SendGrid Event, Mailgun Webhook, Postmark, Resend, SMTP2GO).
- **Signature Verification** — Each provider's webhook payload is verified against HMAC signature before processing.
- **Deduplication** — Provider may re-send events; deduplication using event ID stored in Redis (24h TTL).
- **Event Normalisation** — All provider-specific event formats normalised to internal `EmailEvent` schema.
- **SMTP Inbound (Bounce parsing)** — MX record-based bounce address parses DSN (Delivery Status Notification) emails to record bounces.

### 3.10 Outbound Webhooks
- **Project Webhooks** — Projects register HTTP endpoints. MailForge delivers events to them with retry logic.
- **Event Filtering** — Subscribe to specific event types only.
- **HMAC Signature** — Each delivery signed with `HMAC-SHA256(payload, secret)` for verification by the receiver.
- **Retry Policy** — Exponential backoff: 1m, 5m, 30m, 2h, 12h, 24h. After 24h, event is marked `failed` and delivery stops.
- **Webhook Logs** — Per-webhook delivery log with request/response body (truncated to 4KB) and latency.
- **Webhook Test** — Fire a synthetic event to a webhook URL for integration testing.

### 3.11 DKIM / Deliverability
- **Per-Domain DKIM Keys** — Generate RSA-2048 DKIM key pair per project's From domain. Store private key encrypted in DB; expose DNS TXT record for user to configure.
- **DKIM Signing** — All outgoing emails signed with the project's domain DKIM key (or provider-managed DKIM if provider supports it).
- **DNS Verification** — Check DKIM, SPF, and DMARC DNS records for a domain and report pass/fail with remediation guidance.
- **Bounce Domain Setup** — Guide to configure custom bounce address MX record.
- **Warm-up Mode** — Automatic IP/domain warmup: ramp from small volume to full volume over a configurable schedule.

### 3.12 SuperAdmin Features
- **Project Directory** — List, search, inspect all projects.
- **Platform-wide Metrics** — Total emails/sec, queue depth, worker health, provider error rates.
- **Default Templates** — Create/manage platform-default templates available to all projects.
- **System Configuration** — Global rate limits, max attachment sizes, allowed MIME types, feature flags.
- **Project Suspension** — Suspend/unsuspend any project with audit reason.
- **API Key Revocation** — Revoke any key globally (for security incidents).
- **Event Log** — Platform-wide event search by email, project, timestamp range.
- **Provider Health Board** — Real-time provider circuit-breaker status across all projects.
- **Worker Management** — View running worker instances, queue depths per stream.

---

## 4. System Architecture

```
┌──────────────────────────────────────────────────────────────────────────┐
│                          External Callers                                │
│   SimplifyHiring BFF  │  Other Projects  │  Admin UI (Zitadel OIDC)     │
└────────────┬─────────────────┬──────────────────────┬───────────────────┘
             │                 │                      │
             ▼                 ▼                      ▼
┌────────────────────────────────────────────────────────────────────────┐
│                     Load Balancer / TLS Termination                    │
│                      (Nginx / Traefik / Coolify)                       │
└───────────┬─────────────────────────────────────────────┬─────────────┘
            │                                             │
     ┌──────▼──────┐                             ┌───────▼──────┐
     │  API Server │                             │  Tracking &  │
     │  (Go/Chi)   │                             │  Inbound     │
     │  :8080      │                             │  Server      │
     └──────┬──────┘                             │  :8081       │
            │                                    └───────┬──────┘
            │  Publish                                   │ Publish
            ▼                                            ▼
┌───────────────────────────────────────────────────────────────────────┐
│                         NATS JetStream                                │
│                                                                       │
│  Streams:                                                             │
│  • email.send.critical    (MaxAge: 1h,  Replicas: 3)                 │
│  • email.send.high        (MaxAge: 4h,  Replicas: 3)                 │
│  • email.send.normal      (MaxAge: 24h, Replicas: 3)                 │
│  • email.send.low         (MaxAge: 72h, Replicas: 3)                 │
│  • email.events           (MaxAge: 7d,  Replicas: 3)                 │
│  • webhook.outbound       (MaxAge: 7d,  Replicas: 3)                 │
│  • scheduler.ticks        (MaxAge: 1h,  Replicas: 3)                 │
└────┬────────┬──────────────────────┬──────────────────────┬──────────┘
     │        │                      │                      │
     ▼        ▼                      ▼                      ▼
┌─────────┐ ┌──────────┐     ┌──────────────┐      ┌──────────────┐
│ Send    │ │ Send     │     │ Event        │      │ Webhook      │
│ Workers │ │ Workers  │     │ Workers      │      │ Workers      │
│(critical│ │(normal/  │     │(analytics,   │      │(outbound     │
│ /high)  │ │  low)    │     │ suppression  │      │ HTTP POST)   │
│  x8     │ │  x16     │     │ auto-add)    │      │  x4          │
└────┬────┘ └────┬─────┘     └──────┬───────┘      └──────┬───────┘
     │           │                  │                      │
     ▼           ▼                  ▼                      ▼
┌────────────────────────────────────────────────────────────────────┐
│                         Provider Adapters                          │
│  SES │ SendGrid │ Mailgun │ Postmark │ Resend │ SMTP2GO │ SMTP    │
└────────────────────────────────────────────────────────────────────┘
             │
             ▼
┌────────────────────────────────────────────────────────────────────┐
│                     Data Layer                                     │
│                                                                    │
│  PostgreSQL 16                  Redis 7           MinIO            │
│  ┌───────────────────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │ projects              │  │ api_key cache│  │ attachments/  │  │
│  │ templates             │  │ rate limits  │  │ exports/      │  │
│  │ template_versions     │  │ idempotency  │  │ template_     │  │
│  │ api_keys              │  │ dedup store  │  │ assets/       │  │
│  │ smtp_configs          │  │ circuit brkr │  └───────────────┘  │
│  │ emails                │  └──────────────┘                     │
│  │ email_events          │                                        │
│  │ suppressions          │                                        │
│  │ webhooks              │                                        │
│  │ scheduled_emails      │                                        │
│  │ ab_tests              │                                        │
│  │ drip_sequences        │                                        │
│  └───────────────────────┘                                        │
└────────────────────────────────────────────────────────────────────┘
             │
             ▼
┌────────────────────────────────────────────────────────────────────┐
│                     Observability Stack                            │
│   OpenTelemetry Collector → Jaeger (traces)                       │
│   Prometheus scrape → Grafana (metrics)                           │
│   Structured JSON logs → Loki → Grafana                           │
└────────────────────────────────────────────────────────────────────┘
```

---

## 5. Data Models

### `projects`
```sql
CREATE TABLE projects (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    slug            TEXT UNIQUE NOT NULL,          -- url-safe identifier
    name            TEXT NOT NULL,
    description     TEXT,
    from_name       TEXT NOT NULL,
    from_email      TEXT NOT NULL,
    reply_to        TEXT,
    bounce_address  TEXT,
    timezone        TEXT NOT NULL DEFAULT 'UTC',
    daily_send_limit BIGINT NOT NULL DEFAULT 1000000,
    rate_limit_per_sec INT NOT NULL DEFAULT 1000,
    max_attachment_bytes BIGINT DEFAULT 4194304,   -- 4MB
    status          TEXT NOT NULL DEFAULT 'active', -- active | suspended
    suspended_at    TIMESTAMPTZ,
    suspended_reason TEXT,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```

### `template_collections` (groups of templates)
```sql
CREATE TABLE template_collections (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id  UUID REFERENCES projects(id) ON DELETE CASCADE,
    slug        TEXT NOT NULL,
    name        TEXT NOT NULL,
    category    TEXT NOT NULL,                     -- transactional | marketing | system
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE(project_id, slug)
);
```

### `template_versions`
```sql
CREATE TABLE template_versions (
    id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    template_id       UUID REFERENCES template_collections(id) ON DELETE CASCADE,
    version           INT NOT NULL,                -- auto-increment per template
    subject           TEXT NOT NULL,
    html_body         TEXT NOT NULL,
    text_body         TEXT,                        -- auto-generated if null
    variables_schema  JSONB,                       -- JSON Schema for validation
    status            TEXT NOT NULL DEFAULT 'draft', -- draft | published | archived
    published_at      TIMESTAMPTZ,
    created_by        TEXT,                        -- api key id or user id
    created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE(template_id, version),
    CONSTRAINT one_published_per_template EXCLUDE USING btree (template_id WITH =) 
        WHERE (status = 'published')               -- only one published at a time
);
```

### `api_keys`
```sql
CREATE TABLE api_keys (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id      UUID REFERENCES projects(id) ON DELETE CASCADE,
    name            TEXT NOT NULL,
    key_prefix      TEXT NOT NULL,                 -- first 8 chars for display (mf_proj_)
    key_hash        TEXT NOT NULL UNIQUE,          -- sha256(full_key)
    scopes          TEXT[] NOT NULL DEFAULT '{}',
    expires_at      TIMESTAMPTZ,
    rate_limit_override INT,                       -- override project rate limit
    last_used_at    TIMESTAMPTZ,
    last_used_ip    INET,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    revoked_at      TIMESTAMPTZ,
    revoked_reason  TEXT
);
CREATE INDEX idx_api_keys_hash ON api_keys(key_hash) WHERE revoked_at IS NULL;
```

### `smtp_configs`
```sql
CREATE TABLE smtp_configs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id      UUID REFERENCES projects(id) ON DELETE CASCADE,
    name            TEXT NOT NULL,
    provider_type   TEXT NOT NULL,                 -- ses|sendgrid|mailgun|postmark|resend|smtp2go|smtp
    credentials     BYTEA NOT NULL,                -- AES-256-GCM encrypted JSON
    kek_id          TEXT NOT NULL,                 -- which key was used to encrypt
    priority        INT NOT NULL DEFAULT 0,        -- lower = higher priority
    is_fallback     BOOLEAN NOT NULL DEFAULT FALSE,
    routing_rules   JSONB,                         -- {"category": ["transactional"], "domain_pattern": "*.edu"}
    is_active       BOOLEAN NOT NULL DEFAULT TRUE,
    circuit_state   TEXT NOT NULL DEFAULT 'closed', -- closed | open | half-open
    circuit_opened_at TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```

### `emails` (send log)
```sql
CREATE TABLE emails (
    id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id          UUID REFERENCES projects(id),
    template_version_id UUID REFERENCES template_versions(id),
    idempotency_key     TEXT UNIQUE,
    message_id          TEXT,                      -- provider message ID
    from_email          TEXT NOT NULL,
    from_name           TEXT,
    to_emails           TEXT[] NOT NULL,
    cc_emails           TEXT[],
    bcc_emails          TEXT[],
    reply_to            TEXT,
    subject             TEXT NOT NULL,
    priority            TEXT NOT NULL DEFAULT 'normal',
    tags                TEXT[],
    variables           JSONB,
    provider_id         UUID REFERENCES smtp_configs(id),
    provider_message_id TEXT,
    status              TEXT NOT NULL DEFAULT 'queued',
    scheduled_at        TIMESTAMPTZ,
    dispatched_at       TIMESTAMPTZ,
    created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_emails_project_created ON emails(project_id, created_at DESC);
CREATE INDEX idx_emails_idempotency ON emails(idempotency_key) WHERE idempotency_key IS NOT NULL;
```

### `email_events`
```sql
CREATE TABLE email_events (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email_id    UUID REFERENCES emails(id) ON DELETE CASCADE,
    project_id  UUID REFERENCES projects(id),      -- for partition pruning
    event_type  TEXT NOT NULL,                     -- queued|dispatched|delivered|soft_bounce|hard_bounce|complaint|opened|clicked|unsubscribed|failed
    recipient   TEXT,
    occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    ip_address  INET,
    user_agent  TEXT,
    metadata    JSONB                              -- click URL, bounce code, etc.
) PARTITION BY RANGE (occurred_at);
-- Monthly partitions: email_events_2026_05, email_events_2026_06, etc.
CREATE INDEX idx_events_email_id ON email_events(email_id);
CREATE INDEX idx_events_project_type ON email_events(project_id, event_type, occurred_at DESC);
```

### `suppressions`
```sql
CREATE TABLE suppressions (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id  UUID REFERENCES projects(id) ON DELETE CASCADE,
    email_hash  TEXT NOT NULL,                     -- sha256(lowercase(email)) for GDPR
    email       TEXT,                              -- nullable: set to NULL on erasure
    reason      TEXT NOT NULL,                     -- hard_bounce|soft_bounce_threshold|complaint|manual|unsubscribe
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE(project_id, email_hash)
);
```

### `webhooks`
```sql
CREATE TABLE webhooks (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id  UUID REFERENCES projects(id) ON DELETE CASCADE,
    url         TEXT NOT NULL,
    secret      TEXT NOT NULL,                     -- HMAC signing secret
    event_types TEXT[] NOT NULL,
    is_active   BOOLEAN NOT NULL DEFAULT TRUE,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```

### `webhook_deliveries`
```sql
CREATE TABLE webhook_deliveries (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    webhook_id      UUID REFERENCES webhooks(id) ON DELETE CASCADE,
    email_event_id  UUID,
    event_type      TEXT NOT NULL,
    payload         JSONB NOT NULL,
    attempt         INT NOT NULL DEFAULT 1,
    status          TEXT NOT NULL DEFAULT 'pending', -- pending|success|failed
    response_status INT,
    response_body   TEXT,
    next_retry_at   TIMESTAMPTZ,
    delivered_at    TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```

### `scheduled_emails`
```sql
CREATE TABLE scheduled_emails (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id      UUID REFERENCES projects(id) ON DELETE CASCADE,
    name            TEXT,
    schedule_type   TEXT NOT NULL,                 -- once | cron
    run_at          TIMESTAMPTZ,                   -- for once
    cron_expression TEXT,                          -- for cron
    timezone        TEXT NOT NULL DEFAULT 'UTC',
    template_version_id UUID REFERENCES template_versions(id),
    payload         JSONB NOT NULL,                -- to, variables, tags, etc.
    status          TEXT NOT NULL DEFAULT 'active', -- active | paused | cancelled | completed
    next_run_at     TIMESTAMPTZ,
    last_run_at     TIMESTAMPTZ,
    run_count       INT NOT NULL DEFAULT 0,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_scheduled_next_run ON scheduled_emails(next_run_at) WHERE status = 'active';
```

### `ab_tests`
```sql
CREATE TABLE ab_tests (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id      UUID REFERENCES projects(id),
    name            TEXT NOT NULL,
    status          TEXT NOT NULL DEFAULT 'running', -- running | completed | cancelled
    metric          TEXT NOT NULL DEFAULT 'open_rate', -- open_rate | click_rate
    evaluate_after  INTERVAL NOT NULL DEFAULT '24 hours',
    winner_variant  TEXT,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE ab_test_variants (
    id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    ab_test_id          UUID REFERENCES ab_tests(id) ON DELETE CASCADE,
    name                TEXT NOT NULL,             -- A, B, C, D
    template_version_id UUID REFERENCES template_versions(id),
    traffic_percentage  INT NOT NULL,              -- must sum to 100 across variants
    send_count          INT NOT NULL DEFAULT 0,
    open_count          INT NOT NULL DEFAULT 0,
    click_count         INT NOT NULL DEFAULT 0
);
```

### `dkim_configs`
```sql
CREATE TABLE dkim_configs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id      UUID REFERENCES projects(id) ON DELETE CASCADE,
    domain          TEXT NOT NULL,
    selector        TEXT NOT NULL DEFAULT 'mailforge',
    private_key     BYTEA NOT NULL,               -- AES-256-GCM encrypted
    public_key_pem  TEXT NOT NULL,
    dns_txt_record  TEXT NOT NULL,               -- pre-formatted TXT record value
    verified        BOOLEAN NOT NULL DEFAULT FALSE,
    verified_at     TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE(project_id, domain)
);
```

---

## 6. API Endpoints

### Base URL
```
https://mail.yourdomain.com/v1
```

### Authentication Headers
```
X-API-Key: mf_proj_<key>          # Project-level access
Authorization: Bearer <zitadel_jwt> # SuperAdmin UI access
Idempotency-Key: <uuid>            # Optional, ensures exactly-once
```

---

### 6.1 Projects (SuperAdmin only)
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/admin/projects` | Create a new project |
| `GET` | `/admin/projects` | List all projects (paginated) |
| `GET` | `/admin/projects/:id` | Get project details |
| `PUT` | `/admin/projects/:id` | Update project settings |
| `DELETE` | `/admin/projects/:id` | Soft-delete a project |
| `POST` | `/admin/projects/:id/suspend` | Suspend project (body: `{reason}`) |
| `POST` | `/admin/projects/:id/activate` | Reactivate suspended project |
| `GET` | `/admin/projects/:id/stats` | Aggregate stats for project |
| `GET` | `/admin/projects/:id/audit-log` | Immutable audit log |

**Create Project Request:**
```json
{
  "slug": "simplify-hiring",
  "name": "SimplifyHiring",
  "from_name": "SimplifyHiring",
  "from_email": "noreply@simplifyhr.com",
  "reply_to": "support@simplifyhr.com",
  "timezone": "Asia/Kolkata",
  "daily_send_limit": 500000,
  "rate_limit_per_sec": 2000
}
```

---

### 6.2 Templates
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/projects/:pid/templates` | Create a template |
| `GET` | `/v1/projects/:pid/templates` | List templates (filter: `?category=&status=&q=`) |
| `GET` | `/v1/projects/:pid/templates/:id` | Get template |
| `PUT` | `/v1/projects/:pid/templates/:id` | Update template metadata |
| `DELETE` | `/v1/projects/:pid/templates/:id` | Delete template (and all versions) |
| `POST` | `/v1/projects/:pid/templates/:id/clone` | Clone to same/other project |
| `POST` | `/v1/projects/:pid/templates/:id/versions` | Create a new version (draft) |
| `GET` | `/v1/projects/:pid/templates/:id/versions` | List all versions |
| `GET` | `/v1/projects/:pid/templates/:id/versions/:vid` | Get specific version |
| `POST` | `/v1/projects/:pid/templates/:id/versions/:vid/publish` | Publish a version |
| `POST` | `/v1/projects/:pid/templates/:id/versions/:vid/archive` | Archive a version |
| `GET` | `/v1/projects/:pid/templates/:id/versions/diff?from=:v1&to=:v2` | Diff two versions |
| `POST` | `/v1/projects/:pid/templates/:id/preview` | Render preview with sample data |

**Create Version Request:**
```json
{
  "subject": "Welcome to {{.CompanyName}}, {{.FirstName}}!",
  "html_body": "<html><body>Hi {{.FirstName}}, ...</body></html>",
  "text_body": "Hi {{.FirstName}}, ...",
  "variables_schema": {
    "type": "object",
    "required": ["FirstName", "CompanyName"],
    "properties": {
      "FirstName": { "type": "string" },
      "CompanyName": { "type": "string" },
      "LoginURL": { "type": "string", "format": "uri" }
    }
  }
}
```

---

### 6.3 API Keys
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/projects/:pid/api-keys` | Create a new key (full key shown once) |
| `GET` | `/v1/projects/:pid/api-keys` | List keys (prefix only, no hash) |
| `GET` | `/v1/projects/:pid/api-keys/:id` | Get key metadata |
| `PUT` | `/v1/projects/:pid/api-keys/:id` | Update name / rate limit override |
| `POST` | `/v1/projects/:pid/api-keys/:id/rotate` | Issue new key, old gets 24h grace |
| `DELETE` | `/v1/projects/:pid/api-keys/:id` | Immediately revoke key |

**Create Key Request:**
```json
{
  "name": "Production Send Key",
  "scopes": ["email:send", "analytics:read"],
  "expires_in_days": 365,
  "rate_limit_per_sec": 500
}
```

**Create Key Response (key only shown here):**
```json
{
  "id": "uuid",
  "key": "mf_proj_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6",
  "prefix": "mf_proj_a1b2",
  "scopes": ["email:send", "analytics:read"],
  "expires_at": "2027-05-08T00:00:00Z",
  "warning": "Store this key securely. It will not be shown again."
}
```

---

### 6.4 SMTP / Provider Configuration
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/projects/:pid/smtp-configs` | Add a provider config |
| `GET` | `/v1/projects/:pid/smtp-configs` | List configs (credentials masked) |
| `GET` | `/v1/projects/:pid/smtp-configs/:id` | Get config (credentials masked) |
| `PUT` | `/v1/projects/:pid/smtp-configs/:id` | Update config |
| `DELETE` | `/v1/projects/:pid/smtp-configs/:id` | Remove config |
| `POST` | `/v1/projects/:pid/smtp-configs/:id/test` | Send test email |
| `POST` | `/v1/projects/:pid/smtp-configs/:id/activate` | Set as primary |
| `GET` | `/v1/projects/:pid/smtp-configs/:id/health` | Circuit breaker status |

**Create SES Config:**
```json
{
  "name": "AWS SES - us-east-1",
  "provider_type": "ses",
  "priority": 1,
  "credentials": {
    "region": "us-east-1",
    "access_key_id": "AKIA...",
    "secret_access_key": "xxx"
  },
  "routing_rules": {
    "category": ["marketing", "bulk"]
  }
}
```

**Create Custom SMTP:**
```json
{
  "name": "Internal SMTP Relay",
  "provider_type": "smtp",
  "priority": 10,
  "credentials": {
    "host": "smtp.company.com",
    "port": 587,
    "username": "user@company.com",
    "password": "xxx",
    "tls_mode": "starttls"
  }
}
```

---

### 6.5 Sending Emails
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/send` | Send a single raw email |
| `POST` | `/v1/send/template` | Send using a template |
| `POST` | `/v1/send/batch` | Send raw email to multiple recipients |
| `POST` | `/v1/send/template/batch` | Batch send with per-recipient variables |
| `GET` | `/v1/emails` | List sent emails (filter: `?status=&tag=&from=&to=`) |
| `GET` | `/v1/emails/:id` | Get email details + event timeline |
| `GET` | `/v1/emails/:id/events` | List events for an email |

**Single Raw Send:**
```json
{
  "to": ["alice@example.com"],
  "cc": [],
  "subject": "Your OTP is 483921",
  "html_body": "<p>Your OTP: <strong>483921</strong></p>",
  "text_body": "Your OTP: 483921",
  "from_email": "otp@myapp.com",
  "from_name": "MyApp Security",
  "reply_to": "no-reply@myapp.com",
  "priority": "critical",
  "tags": ["otp", "auth"],
  "attachments": [],
  "headers": { "X-Campaign-ID": "otp-flow-v2" }
}
```

**Single Send — override with saved SMTP config:**
```json
{
  "to": ["bob@client.com"],
  "subject": "Invoice from Acme",
  "html_body": "<p>Please find your invoice attached.</p>",
  "smtp_config_id": "uuid-of-saved-config",
  "from_email": "billing@acme.com",
  "from_name": "Acme Billing",
  "priority": "high"
}
```

**Single Send — inline SMTP credentials (one-off, not saved):**
```json
{
  "to": ["carol@partner.com"],
  "subject": "Welcome from Partner Portal",
  "html_body": "<p>Welcome!</p>",
  "from_email": "hello@partner.com",
  "from_name": "Partner Portal",
  "smtp_override": {
    "provider_type": "smtp",
    "host": "smtp.partner.com",
    "port": 587,
    "username": "hello@partner.com",
    "password": "secret",
    "tls_mode": "starttls"
  },
  "priority": "normal"
}
```
> Inline `smtp_override` credentials are used for this send only and are **never stored or logged**.

**Template Batch Send:**
```json
{
  "template_slug": "welcome-email",
  "template_version": "published",
  "priority": "high",
  "tags": ["onboarding"],
  "recipients": [
    {
      "email": "alice@example.com",
      "variables": { "FirstName": "Alice", "CompanyName": "Acme Corp" }
    },
    {
      "email": "bob@example.com",
      "variables": { "FirstName": "Bob", "CompanyName": "Beta Inc" }
    }
  ]
}
```

**Batch Send Response:**
```json
{
  "batch_id": "uuid",
  "accepted": 2,
  "rejected": 0,
  "suppressed": 0,
  "email_ids": ["uuid1", "uuid2"]
}
```

---

### 6.6 Scheduling
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/schedule` | Create a scheduled or recurring send |
| `GET` | `/v1/schedule` | List scheduled sends |
| `GET` | `/v1/schedule/:id` | Get a schedule |
| `PUT` | `/v1/schedule/:id` | Update schedule (only if not fired) |
| `POST` | `/v1/schedule/:id/pause` | Pause a recurring schedule |
| `POST` | `/v1/schedule/:id/resume` | Resume a paused schedule |
| `DELETE` | `/v1/schedule/:id` | Cancel a scheduled send |
| `GET` | `/v1/schedule/:id/next-runs?n=5` | Preview next N fire times |

**Schedule Request:**
```json
{
  "name": "Weekly Digest",
  "schedule_type": "cron",
  "cron_expression": "0 9 * * 1",
  "timezone": "America/New_York",
  "template_slug": "weekly-digest",
  "payload": {
    "to": ["subscribers-segment"],
    "variables": { "Edition": "{{.CurrentWeek}}" }
  }
}
```

---

### 6.7 Analytics
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/v1/analytics/overview` | Project-level summary (sends, delivery rate, open rate) |
| `GET` | `/v1/analytics/timeseries` | Time-series data (`?metric=sent&bucket=hour&from=&to=`) |
| `GET` | `/v1/analytics/heatmap` | Hour × Day engagement heatmap |
| `GET` | `/v1/analytics/templates` | Per-template performance table |
| `GET` | `/v1/analytics/templates/:tid` | Deep stats for one template |
| `GET` | `/v1/analytics/providers` | Provider comparison (delivery, bounce, latency) |
| `POST` | `/v1/analytics/export` | Request async CSV/JSON export |
| `GET` | `/v1/analytics/exports/:id` | Poll export status, get download URL |
| `GET` | `/v1/analytics/ab-tests` | List A/B tests |
| `GET` | `/v1/analytics/ab-tests/:id` | Per-variant metrics |

---

### 6.8 Suppression Management
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/suppressions` | Add one or more emails to suppression |
| `GET` | `/v1/suppressions` | List suppressed emails |
| `GET` | `/v1/suppressions/check?email=` | Check if email is suppressed |
| `DELETE` | `/v1/suppressions/:email_hash` | Remove from suppression |
| `DELETE` | `/v1/suppressions/:email_hash/erase` | GDPR erasure (anonymise all records) |
| `POST` | `/v1/suppressions/import` | Bulk import (CSV or JSON) |
| `GET` | `/v1/suppressions/export` | Export as CSV |

---

### 6.9 Webhooks (Outbound)
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/webhooks` | Register a webhook endpoint |
| `GET` | `/v1/webhooks` | List webhooks |
| `GET` | `/v1/webhooks/:id` | Get webhook |
| `PUT` | `/v1/webhooks/:id` | Update URL, secret, events |
| `DELETE` | `/v1/webhooks/:id` | Delete webhook |
| `POST` | `/v1/webhooks/:id/test` | Fire a synthetic test event |
| `GET` | `/v1/webhooks/:id/deliveries` | Delivery log |
| `POST` | `/v1/webhooks/:id/deliveries/:did/retry` | Manually retry a failed delivery |

---

### 6.10 Inbound Webhooks (Provider Events → MailForge)
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/inbound/ses` | AWS SES SNS notification |
| `POST` | `/inbound/sendgrid` | SendGrid Event Webhook |
| `POST` | `/inbound/mailgun` | Mailgun Webhook |
| `POST` | `/inbound/postmark` | Postmark bounce/delivery webhook |
| `POST` | `/inbound/resend` | Resend event webhook |
| `POST` | `/inbound/smtp2go` | SMTP2GO webhook |
| `POST` | `/inbound/smtp` | Parsed inbound SMTP (DSN bounce) |

---

### 6.11 Tracking Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/t/o/:tid` | Open pixel (1×1 GIF) |
| `GET` | `/t/c/:tid` | Click redirect (302 → original URL) |
| `POST` | `/t/u/:tid` | One-click unsubscribe (RFC 8058) |
| `GET` | `/pref/:token` | Preference center page |
| `POST` | `/pref/:token` | Save preference center changes |

---

### 6.12 DKIM / Deliverability
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/dkim` | Generate DKIM key pair for a domain |
| `GET` | `/v1/dkim` | List DKIM configs |
| `GET` | `/v1/dkim/:id` | Get DNS record to publish |
| `POST` | `/v1/dkim/:id/verify` | Check DNS propagation |
| `DELETE` | `/v1/dkim/:id` | Revoke DKIM config |
| `GET` | `/v1/dkim/check?domain=` | Check SPF + DKIM + DMARC for a domain |

---

## 7. Go Backend Architecture

### 7.1 Repository Structure
```
mailforge/
├── cmd/
│   ├── api/               # HTTP API + Tracking server
│   │   └── main.go
│   ├── worker/            # NATS JetStream consumers
│   │   └── main.go
│   ├── scheduler/         # Cron/schedule engine
│   │   └── main.go
│   └── migrate/           # DB migrations runner
│       └── main.go
│
├── internal/
│   ├── api/
│   │   ├── handler/
│   │   │   ├── project.go
│   │   │   ├── template.go
│   │   │   ├── apikey.go
│   │   │   ├── smtp.go
│   │   │   ├── send.go
│   │   │   ├── schedule.go
│   │   │   ├── analytics.go
│   │   │   ├── suppression.go
│   │   │   ├── webhook.go
│   │   │   ├── dkim.go
│   │   │   ├── tracking.go
│   │   │   ├── inbound.go
│   │   │   └── admin.go
│   │   ├── middleware/
│   │   │   ├── auth.go          # API key validation + Zitadel JWT
│   │   │   ├── ratelimit.go     # token bucket per project/key
│   │   │   ├── requestid.go
│   │   │   ├── idempotency.go   # Redis-backed idempotency
│   │   │   ├── logger.go
│   │   │   └── recovery.go
│   │   └── router/
│   │       └── router.go        # Chi router wiring
│   │
│   ├── domain/
│   │   ├── project/
│   │   │   ├── entity.go        # Project struct, validation
│   │   │   ├── repository.go    # Repository interface
│   │   │   └── service.go       # Business logic
│   │   ├── template/
│   │   │   ├── entity.go
│   │   │   ├── repository.go
│   │   │   └── service.go
│   │   ├── apikey/
│   │   │   ├── entity.go
│   │   │   ├── repository.go
│   │   │   └── service.go
│   │   ├── email/
│   │   │   ├── entity.go
│   │   │   ├── repository.go
│   │   │   └── service.go       # Fan-out, suppression check, queue publish
│   │   ├── analytics/
│   │   │   ├── entity.go
│   │   │   ├── repository.go
│   │   │   └── service.go
│   │   ├── suppression/
│   │   │   ├── entity.go
│   │   │   ├── repository.go
│   │   │   └── service.go
│   │   ├── webhook/
│   │   │   ├── entity.go
│   │   │   ├── repository.go
│   │   │   └── service.go
│   │   └── scheduler/
│   │       ├── entity.go
│   │       ├── repository.go
│   │       └── service.go
│   │
│   ├── infra/
│   │   ├── postgres/            # pgx pool, query helpers
│   │   ├── redis/               # connection + helpers
│   │   ├── nats/                # JetStream publish/consume helpers
│   │   ├── minio/               # object storage client
│   │   └── crypto/
│   │       ├── aes.go           # AES-256-GCM encrypt/decrypt
│   │       ├── hash.go          # sha256 for key storage
│   │       ├── dkim.go          # RSA key generation + signing
│   │       └── token.go         # secure random token generation
│   │
│   ├── provider/
│   │   ├── interface.go         # Provider interface
│   │   ├── message.go           # Unified SendMessage struct
│   │   ├── result.go            # SendResult struct
│   │   ├── ses/
│   │   │   └── provider.go
│   │   ├── sendgrid/
│   │   │   └── provider.go
│   │   ├── mailgun/
│   │   │   └── provider.go
│   │   ├── postmark/
│   │   │   └── provider.go
│   │   ├── resend/
│   │   │   └── provider.go
│   │   ├── smtp2go/
│   │   │   └── provider.go
│   │   └── smtp/
│   │       └── provider.go
│   │
│   ├── renderer/
│   │   ├── gotemplate.go        # Go html/template renderer
│   │   ├── textstripper.go      # HTML → plain text fallback
│   │   └── preview.go           # Preview rendering with mock data
│   │
│   ├── tracker/
│   │   ├── injector.go          # Inject tracking pixel + rewrite links
│   │   ├── store.go             # HMAC-signed tracking token creation/validation
│   │   └── handler.go           # Open pixel + click redirect HTTP handlers
│   │
│   ├── worker/
│   │   ├── send_worker.go       # Consume email.send.* → call provider
│   │   ├── event_worker.go      # Consume email.events → write DB + trigger auto-suppress
│   │   └── webhook_worker.go    # Consume webhook.outbound → HTTP POST with retry
│   │
│   ├── scheduler/
│   │   ├── engine.go            # Poll DB for due schedules
│   │   ├── drip.go              # Drip sequence state machine
│   │   └── cron.go              # Cron expression parser (robfig/cron)
│   │
│   └── config/
│       └── config.go            # Viper-based config from env
│
├── pkg/                         # Shared, no domain dependencies
│   ├── pagination/
│   ├── validator/               # JSON Schema validation (santhosh-tekuri/jsonschema)
│   ├── httputil/
│   └── telemetry/               # OTEL setup
│
├── migrations/                  # golang-migrate SQL files
│   ├── 000001_create_projects.up.sql
│   └── ...
│
├── docker-compose.yml
├── docker-compose.prod.yml
├── Makefile
└── go.mod
```

### 7.2 Provider Interface
```go
// internal/provider/interface.go

type Provider interface {
    Name() string
    Send(ctx context.Context, msg *Message) (*Result, error)
    VerifyWebhookSignature(r *http.Request) ([]Event, error)
}

type Message struct {
    From        Address
    To          []Address
    CC          []Address
    BCC         []Address
    ReplyTo     *Address
    Subject     string
    HTMLBody     string
    TextBody     string
    Headers     map[string]string
    Attachments []Attachment
    DKIMSigner  *dkim.Signer   // nil = provider-managed DKIM
}

type Result struct {
    MessageID    string
    ProviderName string
    StatusCode   int
    RawResponse  []byte
}

type Event struct {
    Type       EventType
    MessageID  string
    Recipient  string
    OccurredAt time.Time
    Metadata   map[string]any
}
```

### 7.3 NATS Message Schemas
```go
// SendJob published to email.send.<priority>
type SendJob struct {
    EmailID    uuid.UUID `json:"email_id"`
    ProjectID  uuid.UUID `json:"project_id"`
    ProviderID uuid.UUID `json:"provider_id"`
    Message    provider.Message `json:"message"`
    Attempt    int `json:"attempt"`
}

// EmailEventMsg published to email.events
type EmailEventMsg struct {
    EmailID    uuid.UUID `json:"email_id"`
    ProjectID  uuid.UUID `json:"project_id"`
    EventType  string    `json:"event_type"`
    Recipient  string    `json:"recipient"`
    OccurredAt time.Time `json:"occurred_at"`
    Metadata   map[string]any `json:"metadata,omitempty"`
}

// WebhookJob published to webhook.outbound
type WebhookJob struct {
    WebhookID      uuid.UUID `json:"webhook_id"`
    DeliveryID     uuid.UUID `json:"delivery_id"`
    URL            string    `json:"url"`
    Secret         string    `json:"secret"`
    Payload        json.RawMessage `json:"payload"`
    Attempt        int       `json:"attempt"`
    NextRetryDelay time.Duration `json:"next_retry_delay_ms"`
}
```

### 7.4 Circuit Breaker
Each provider per project has a circuit breaker stored in Redis:
- **Closed** → Normal operation.
- **Open** → Provider failed ≥ N times in a 60s window. Traffic redirected to fallback. Re-checked after 30s.
- **Half-Open** → One test request sent. If success → Closed. If fail → Open again.

---

## 8. Provider Abstraction Layer

### Routing Decision (per send)
```
1. Load project's smtp_configs ordered by priority
2. Filter by routing_rules (category, domain)
3. Skip any with circuit_state = 'open'
4. Use first match as primary
5. If primary send fails (transient): retry on fallback
6. If no fallback: publish to dead-letter stream, update email status to 'failed'
```

### Inbound Event Normalisation
Each provider has a different webhook payload format. The inbound handler normalises them:

| Provider | Bounce event | Delivery event | Open event |
|---|---|---|---|
| SES | `Bounce.bounceType` | `Delivery` | N/A (native) |
| SendGrid | `event: bounce` | `event: delivered` | `event: open` |
| Mailgun | `event: failed` | `event: delivered` | `event: opened` |
| Postmark | `Type: HardBounce` | `MessageEvents[].ReceivedAt` | (via open tracking) |
| Resend | `type: email.bounced` | `type: email.delivered` | `type: email.opened` |

All normalised to internal `EventType`:
`queued | dispatched | delivered | soft_bounce | hard_bounce | complaint | opened | clicked | unsubscribed | failed`

---

## 9. NATS JetStream Design

### Streams

| Stream | Subjects | Retention | Replicas | Purpose |
|---|---|---|---|---|
| `EMAIL_SEND_CRITICAL` | `email.send.critical` | Interest, max-age 1h | 3 | OTPs, auth emails |
| `EMAIL_SEND_HIGH` | `email.send.high` | Interest, max-age 4h | 3 | Transactional |
| `EMAIL_SEND_NORMAL` | `email.send.normal` | Interest, max-age 24h | 3 | Notifications |
| `EMAIL_SEND_LOW` | `email.send.low` | Interest, max-age 72h | 3 | Marketing/bulk |
| `EMAIL_EVENTS` | `email.events.>` | Limits, max-age 7d | 3 | Delivery events |
| `WEBHOOK_OUT` | `webhook.outbound.>` | Limits, max-age 7d | 3 | Outbound delivery |
| `SCHEDULER` | `scheduler.tick` | WorkQueue | 3 | Scheduler ticks |

### Consumer Groups
Each stream has a `push` consumer with:
- `AckPolicy: Explicit` — worker must `Ack()` each message
- `MaxDeliver: 5` — max attempts before NAK lands in dead-letter
- `AckWait: 30s` for normal, `5s` for critical

### Dead Letter Handling
Unprocessable messages flow to `email.dlq.*`. A DLQ consumer logs them, updates the email status to `failed`, and fires a `failed` event.

---

## 10. Template Engine

### Syntax
Go's `html/template` with auto-HTML escaping. Variables use `{{.VariableName}}` syntax.

**Supported Features:**
- Variable substitution: `{{.FirstName}}`
- Conditionals: `{{if .HasDiscount}}...{{end}}`
- Loops: `{{range .Items}}...{{end}}`
- Date formatting: `{{formatDate .CreatedAt "2006-01-02"}}`
- URL safe: `{{urlEncode .RedirectURL}}`
- Currency: `{{currency .Amount "USD"}}`

**Custom Functions registered:**
```go
var funcMap = template.FuncMap{
    "formatDate":  formatDate,
    "urlEncode":   url.QueryEscape,
    "currency":    formatCurrency,
    "truncate":    truncateString,
    "toUpper":     strings.ToUpper,
    "toLower":     strings.ToLower,
    "safeHTML":    func(s string) template.HTML { return template.HTML(s) },
}
```

### Tracking Injection (Post-Render)
After rendering, the `tracker.Injector` performs:
1. Parse rendered HTML with `golang.org/x/net/html`
2. Append `<img src="/t/o/:tid" width="1" height="1">` before `</body>`
3. Rewrite all `<a href="...">` to `<a href="/t/c/:click_tid">`
4. Append `List-Unsubscribe` and `List-Unsubscribe-Post` headers

---

## 11. Analytics Pipeline

### Event Flow
```
Provider Webhook → Inbound Handler → Normalise → Publish to email.events
                                                         │
                                              Event Worker consumes
                                                         │
                                         ┌───────────────┴───────────────┐
                                         ▼                               ▼
                                  Write to                     Check suppression rules
                                  email_events table           (hard bounce → auto-suppress)
                                         │
                                         ▼
                              Increment aggregate counters
                              in Redis (TTL 25h, flushed hourly to PG)
                                         │
                                         ▼
                              Publish to webhook.outbound
                              (if project has matching webhook)
```

### Redis Aggregation Counters
```
key: stats:{project_id}:{date}:{hour}:{metric}
type: Integer
TTL: 25 hours

Metrics: sent, delivered, soft_bounce, hard_bounce, complaint, opened, clicked, unsubscribed, failed
```
A background goroutine flushes Redis counters to PostgreSQL aggregate tables every hour.

### A/B Test Evaluation
A background job runs every 5 minutes:
1. Find `ab_tests` where `status = 'running'` and `NOW() > created_at + evaluate_after`
2. Calculate metric (open_rate or click_rate) per variant
3. Determine winner (Fisher's exact test for statistical significance at p < 0.05)
4. Mark winner, update test status to `completed`

---

## 12. Compliance & Deliverability

### GDPR
| Requirement | Implementation |
|---|---|
| Right to erasure | `DELETE /suppressions/:email/erase` — anonymises all `emails` and `email_events` records for that address; suppression row retains hashed email only |
| Data minimisation | IP addresses stored as `/24` subnet only for geo-analytics |
| Consent | Platform records timestamp + source of unsubscribe; preference center provides per-category opt-out |
| Data export | `GET /v1/analytics/export` produces downloadable archive |

### CAN-SPAM / CASL
- Physical mailing address in footer (configurable per project)
- `List-Unsubscribe` header on all outgoing email
- One-click unsubscribe endpoint (`/t/u/:tid`) processes `POST`
- Marketing category emails require explicit suppression check
- Suppression honoured within 10 business days (immediate in practice)

### DKIM / SPF / DMARC
- Per-domain RSA-2048 DKIM key pair generated and stored encrypted
- `DKIM-Signature` header added to every outgoing email
- DNS verification check surfaces SPF, DKIM, and DMARC record status
- Guidance text generated for remediation

### Warm-up Mode
When a new sending domain is configured:
1. Day 1–2: Max 200 emails/day
2. Day 3–5: Max 1,000/day
3. Day 6–10: Max 5,000/day
4. Day 11–15: Max 20,000/day
5. Day 16+: Full quota

Configurable per project. Warm-up limits enforced in the send service before queuing.

---

## 13. Security Architecture

### API Key Security
- Key stored as `sha256(key)` only; prefix stored for display
- Key format: `mf_proj_<32-char-random>` (256-bit entropy)
- Rate limiting: token bucket per key (Redis), default 100 req/10s
- Scope enforcement: middleware checks required scope before handler
- Auto-revoke: cron job marks expired keys `revoked_at`

### Credential Encryption
- SMTP credentials and provider API keys encrypted with AES-256-GCM
- Per-project Data Encryption Key (DEK) wrapped by a master Key Encryption Key (KEK)
- **Platform-managed KEK:** Default mode. Master KEK stored in environment variable / secrets manager (not in DB). Used to wrap each project's DEK.
- **BYOK (Bring Your Own Key):** Project supplies its own KEK via `POST /v1/projects/:pid/byok` with the raw key material. Platform wraps the project DEK with the project-supplied KEK, then discards the raw key — it is never stored. On any operation that needs the DEK (e.g., adding a new SMTP config), the project must re-supply the KEK in the `X-Project-KEK` request header (base64-encoded). This header is scrubbed from all logs.
- Projects can migrate between BYOK and platform-managed KEK at any time via a re-wrap operation (old KEK in → new KEK wraps the DEK → old KEK discarded).
- **Inline `smtp_override`:** One-off SMTP credentials passed in send requests are used in-memory only (construct the provider client, send, discard). They pass through the same TLS validation as saved configs but are never written to any storage or log.

### Request Security
- All endpoints over TLS only
- `Content-Security-Policy`, `X-Frame-Options`, `X-Content-Type-Options` headers on UI
- SQL injection prevention: parameterised queries only (pgx), zero string interpolation
- SSRF prevention: webhook URLs validated against allowlist; RFC 1918 and localhost blocked
- Inbound webhook signature verification before any processing

### Audit Logging
Immutable append-only audit log (separate `audit_events` table, no UPDATE/DELETE allowed) for:
- Project setting changes
- API key creation/rotation/revocation
- SMTP config changes
- SuperAdmin actions

---

## 14. Admin UI Architecture

Only a **SuperAdmin UI** is built. Project admins consume the API directly (or build their own UI).

### Tech Stack
- **Framework:** Next.js 15 (App Router, TypeScript)
- **Auth:** Zitadel OIDC — NextAuth.js adapter
- **UI Components:** shadcn/ui + Tailwind CSS
- **Charts:** Recharts
- **State:** React Query (TanStack Query) for server state

### SuperAdmin UI Pages
```
/dashboard              — Platform overview: total projects, emails/min, queue depth
/projects               — Project list + search
/projects/[id]          — Project detail: stats, settings, suspend/activate
/projects/[id]/templates — Read-only template browser
/projects/[id]/api-keys  — Key management
/projects/[id]/smtp      — SMTP config management
/projects/[id]/analytics — Analytics dashboard
/projects/[id]/suppression — Suppression list management
/projects/[id]/webhooks  — Webhook management
/templates/defaults      — Platform default template management
/audit-log              — Platform-wide audit log
/infrastructure         — Worker health, NATS stream depths, provider circuit states
/settings               — Global rate limits, feature flags
```

---

## 15. Deployment Architecture

### Docker Compose (Development)
```yaml
services:
  api:           # cmd/api — :8080
  worker:        # cmd/worker — N replicas
  scheduler:     # cmd/scheduler
  nats:          # nats:2.10-alpine with JetStream enabled
  postgres:      # postgres:16-alpine
  redis:         # redis:7-alpine
  minio:         # minio/minio
  grafana:       # grafana/grafana
  prometheus:    # prom/prometheus
  jaeger:        # jaegertracing/all-in-one
  loki:          # grafana/loki
```

### Production (Coolify)
```
Primary Region:
  - API Service (3 replicas, autoscale on CPU)
  - Worker Service — Send Critical (4 replicas)
  - Worker Service — Send Normal/Low (8 replicas)
  - Scheduler Service (1 replica, leader election via PG advisory lock)
  - NATS JetStream cluster (3 nodes)
  - PostgreSQL 16 (primary + read replica)
  - Redis 7 Sentinel (3 nodes)
  - MinIO (distributed, 4 nodes)

DR Region (active-passive):
  - PostgreSQL streaming replication from primary
  - NATS JetStream mirror
  - MinIO bucket replication
  - Standby API/Worker services (scaled to 0, ready to scale up)
```

### Observability
- **Metrics:** Prometheus scrapes all services; Grafana dashboards for emails/sec, queue depth, provider error rates, worker lag
- **Traces:** OTEL spans for every send → queue → worker → provider call chain; Jaeger UI
- **Logs:** Structured JSON via zerolog → Loki → Grafana log explorer
- **Alerts:** Grafana alerts for: queue depth > 100k, provider error rate > 5%, hard bounce rate > 3%

---

## 16. Performance: 10k+ Emails/sec

### Bottleneck Analysis
| Stage | Throughput | Solution |
|---|---|---|
| API → Queue | ~50k req/sec per instance | NATS publish is async, sub-millisecond |
| Queue depth | Unlimited (persistent) | JetStream durable, disk-backed |
| Worker → Provider | Provider-limited (SES: 14 req/sec per connection, but multiple connections) | Worker pool per provider with configurable concurrency |
| DB writes | ~10k inserts/sec on `email_events` | Table partitioning + batch insert (pgx `CopyFrom`) |
| Redis counters | ~100k ops/sec per node | In-memory, always fast |

### Concurrency Model per Worker
```go
// Each worker goroutine pool
const (
    SendWorkerCriticalConcurrency = 100   // goroutines per instance
    SendWorkerNormalConcurrency   = 500   
    EventWorkerConcurrency        = 2000  // lightweight, just DB writes
    WebhookWorkerConcurrency      = 200
)
```

With 8 send worker instances at 500 concurrency each = **4,000 concurrent provider calls**. At an average 250ms per provider call, throughput = 4000 / 0.25 = **16,000 emails/sec**.

### Batch Insert for Events
```go
// Instead of one INSERT per event:
_, err = pool.CopyFrom(ctx,
    pgx.Identifier{"email_events"},
    []string{"email_id", "project_id", "event_type", "recipient", "occurred_at", "metadata"},
    pgx.CopyFromRows(rows),
)
// Flushes every 100ms or when 1000 events accumulated
```

### Rate Limiting
- Per-project token bucket in Redis
- Per-API-key override bucket
- Global platform emergency brake (env var `MAX_GLOBAL_SEND_RATE`)

---

## 17. Phased Rollout Plan

### Phase 1 — Core 
- [ ] Project CRUD + API key management
- [ ] Custom SMTP + one provider (Postmark)
- [ ] Template CRUD + versioning
- [ ] Single send + template send
- [ ] NATS JetStream pipeline
- [ ] Basic delivery event recording
- [ ] Suppression list
- [ ] Docker Compose dev environment

### Phase 2 — Scale & Reliability 
- [ ] All 7 provider adapters
- [ ] Batch send (fan-out up to 10k)
- [ ] Circuit breaker + provider fallback
- [ ] Inbound webhook event normalisation (all providers)
- [ ] Idempotency keys
- [ ] Rate limiting (per-project + per-key)
- [ ] DKIM signing
- [ ] Open/click tracking

### Phase 3 — Analytics & Compliance 
- [ ] Full analytics pipeline (Redis → PG flush)
- [ ] A/B testing
- [ ] Heatmaps
- [ ] GDPR erasure endpoint
- [ ] Preference center
- [ ] Outbound webhooks with retry
- [ ] CSV export via MinIO

### Phase 4 — Scheduling & SuperAdmin UI 
- [ ] Exact + timezone-aware scheduling
- [ ] Cron recurring sends
- [ ] Drip sequences
- [ ] SuperAdmin UI (Next.js)
- [ ] Platform default templates
- [ ] Warm-up mode
- [ ] Observability stack (Prometheus + Grafana + Loki + Jaeger)

### Phase 5 — Hardening 
- [ ] Multi-region DR setup
- [ ] Load testing (target: 15k emails/sec sustained)
- [ ] Penetration testing
- [ ] Chaos engineering (kill worker, kill provider, DB failover)
- [ ] Runbook documentation

---

## 18. Open Questions & Future Scope

### Open Questions
1. **Service name** — "MailForge" is a placeholder. Confirm before domain/Docker image naming.
2. **SimplifyHiring migration** — Which transactional emails (OTP, invites, reports) migrate first?
3. **SMTP inbound / bounce parsing** — Do we need our own MX record, or use provider-managed bounce handling?
4. **Drip sequences** — Should drip enrolment be API-driven (caller triggers enrol), or event-driven (MailForge listens to events)?
5. **IP warm-up** — Will we use dedicated IPs from providers (SES dedicated IP, Postmark Sender Signatures), or shared pools?
6. **Multi-language templates** — i18n support needed? (render same template in multiple locales)

### Future Scope (Post-Phase 5)
- **Email Builder UI** — Drag-and-drop MJML/HTML editor for templates (could be React Email based)
- **Contact Lists** — Full contact database with segments and custom fields
- **Campaign Manager** — Schedule one-time campaigns to segments
- **Inbox Preview** — Render email preview across email clients (Litmus-style integration)
- **AMP for Email** — Interactive AMP email support
- **Billing Integration** — Usage-based billing if MailForge becomes external SaaS
- **SDK Libraries** — Official Go, Node.js, Python client libraries
- **Zapier / n8n Connector** — No-code integration

---

*Document owner: Platform Engineering · Last updated: 2026-05-08*
