===========================
Database Backends Reference
===========================

ActingWeb supports two production-ready database backends: **DynamoDB** and **PostgreSQL**.
This document provides a detailed comparison to help you choose the right backend for your use case.

Quick Comparison
----------------

.. list-table::
   :header-rows: 1
   :widths: 25 35 40

   * - Feature
     - DynamoDB
     - PostgreSQL
   * - **Setup Complexity**
     - Low (auto-creates tables)
     - Medium (requires Alembic migrations)
   * - **Local Development**
     - DynamoDB Local (Docker)
     - PostgreSQL (Docker or native install)
   * - **Installation**
     - ``pip install 'actingweb[dynamodb]'``
     - ``pip install 'actingweb[postgresql]'``
   * - **Scaling**
     - Automatic, serverless
     - Manual (vertical/horizontal scaling)
   * - **Cost Model**
     - Pay-per-request or provisioned throughput
     - Instance-based (compute + storage)
   * - **Query Flexibility**
     - Key-based with Global Secondary Indexes
     - Full SQL with JOINs, aggregations, CTEs
   * - **Latency (local)**
     - Higher (network overhead to AWS)
     - Lower (direct database connection)
   * - **Latency (same region)**
     - Very low (<10ms)
     - Very low (<5ms)
   * - **Production Management**
     - Fully managed by AWS
     - Self-managed or RDS/Cloud SQL
   * - **Multi-region**
     - Built-in global tables
     - Manual replication (streaming/logical)
   * - **Transactions**
     - Limited (single table, conditions)
     - Full ACID transactions
   * - **Backup/Restore**
     - Point-in-time recovery built-in
     - pg_dump or continuous archiving
   * - **TTL/Cleanup**
     - Built-in automatic TTL deletion
     - pg_cron or scheduled jobs
   * - **Connection Pooling**
     - Not applicable (HTTP API)
     - Built-in (psycopg3 ConnectionPool)

When to Use DynamoDB
--------------------

**Best For:**

- **AWS-centric deployments**: Already using AWS Lambda, API Gateway, etc.
- **Global distribution**: Need multi-region active-active setup with minimal effort
- **Variable traffic**: Traffic patterns with large spikes or long idle periods
- **Serverless architecture**: Zero server management, automatic scaling
- **Key-value workloads**: Simple lookups by actor ID, property name, trust relationship
- **Pay-per-use economics**: Want to pay only for actual usage, not idle capacity

**Advantages:**

1. **Zero administration**: No servers to manage, no capacity planning
2. **Automatic scaling**: Handles traffic spikes without intervention
3. **Global tables**: Multi-region replication with conflict resolution
4. **Built-in TTL**: Automatic cleanup of expired data
5. **On-demand pricing**: Pay only for requests (or use provisioned capacity)
6. **High availability**: 99.99% SLA within region, 99.999% for global tables

**Considerations:**

1. **Cost at scale**: Can become expensive with sustained high traffic (>100k requests/day)
2. **AWS lock-in**: Difficult to migrate away from AWS ecosystem
3. **Index management**: Global Secondary Indexes have cost and design implications

**Example Use Cases:**

- Serverless web application on AWS Lambda
- Mobile backend with global user base
- IoT platform with burst traffic patterns
- Microservices with variable load

When to Use PostgreSQL
----------------------

**Best For:**

- **Predictable traffic**: Steady load or known capacity requirements
- **Cost optimization**: High-traffic apps where instance-based pricing is cheaper
- **Multi-cloud/on-prem**: Want to avoid cloud vendor lock-in
- **Existing PostgreSQL expertise**: Team already knows PostgreSQL
- **Transaction requirements**: Need complex multi-table transactions

**Advantages:**

1. **Lower latency**: Direct connection pool reduces overhead
2. **Cost efficiency**: Fixed monthly cost regardless of request volume
3. **Rich ecosystem**: Extensive tools, extensions (PostGIS, full-text search, etc.)
4. **ACID transactions**: Full transactional guarantees across operations
5. **No vendor lock-in**: Works on any cloud or on-premises
6. **Advanced features**: Triggers, stored procedures, custom functions

**Considerations:**

1. **Manual scaling**: Must provision capacity and handle scaling yourself
2. **Database management**: Need to manage backups, updates, monitoring
3. **Connection limits**: Connection pooling required for high concurrency
4. **Migration setup**: Must run Alembic migrations before first use
5. **TTL cleanup**: Must configure pg_cron or scheduled jobs (not automatic)

**Example Use Cases:**

- Traditional web application with steady traffic
- Reporting/analytics requirements
- Multi-tenant SaaS application
- Enterprise deployment with on-premises requirements

Performance Characteristics
---------------------------

Actor Operations
~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 30 20 20 30

   * - Operation
     - DynamoDB
     - PostgreSQL
     - Notes
   * - Actor create
     - 5-15ms
     - 2-5ms
     - PostgreSQL faster for local/same-region
   * - Actor read
     - 5-10ms
     - 1-3ms
     - Both very fast for key-based lookup
   * - Actor update
     - 5-10ms
     - 2-4ms
     - PostgreSQL slightly faster
   * - Property write
     - 5-10ms
     - 1-3ms
     - PostgreSQL benefits from connection pooling
   * - Property read
     - 5-10ms
     - 1-2ms
     - Both efficient for primary key lookup
   * - Trust list
     - 10-20ms
     - 2-5ms
     - PostgreSQL better for complex queries

Scaling Characteristics
~~~~~~~~~~~~~~~~~~~~~~~

**DynamoDB:**

- Horizontal scaling: Automatic partition management
- Read capacity: Auto-scales to millions of requests/second
- Write capacity: Auto-scales to millions of requests/second
- Latency: Consistent regardless of table size
- Limits: 400KB item size, 25 operations per transaction

**PostgreSQL:**

- Vertical scaling: Larger instances for more CPU/RAM
- Horizontal scaling: Read replicas, sharding (manual setup)
- Connection pooling: Essential for Lambda/serverless (use external pool like RDS Proxy)
- Latency: Can degrade with very large tables without proper indexing
- Limits: 1GB per row, no practical transaction limit

Cost Analysis
-------------

Example: 1 million requests/day
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**DynamoDB Costs (us-east-1):**

.. code-block:: text

    On-Demand Pricing:
    - 1M writes/day × $1.25/million = $1.25/day = $37.50/month
    - 3M reads/day × $0.25/million = $0.75/day = $22.50/month
    - Storage: 10GB × $0.25/GB = $2.50/month
    Total: ~$63/month

    Provisioned Capacity (if steady load):
    - 12 WCU × 730 hours × $0.00065 = $5.69/month
    - 35 RCU × 730 hours × $0.00013 = $3.32/month
    - Storage: 10GB × $0.25/GB = $2.50/month
    Total: ~$12/month (62% savings with provisioned)

**PostgreSQL Costs (AWS RDS):**

.. code-block:: text

    db.t4g.small (2 vCPU, 2GB RAM):
    - Instance: $0.017/hour × 730 hours = $12.41/month
    - Storage: 20GB SSD × $0.115/GB = $2.30/month
    - Backup storage: 20GB × $0.095/GB = $1.90/month
    Total: ~$17/month

    db.t4g.medium (2 vCPU, 4GB RAM):
    - Instance: $0.034/hour × 730 hours = $24.82/month
    - Storage: 50GB SSD × $0.115/GB = $5.75/month
    - Backup storage: 50GB × $0.095/GB = $4.75/month
    Total: ~$35/month

**Analysis**: For this traffic level, PostgreSQL is more cost-effective. DynamoDB becomes competitive only with:
- Highly variable traffic (benefit from on-demand)
- Global replication requirements
- Zero-ops preference (worth the premium)

Migration Between Backends
---------------------------

ActingWeb provides full migration tooling to switch backends without data loss.

DynamoDB → PostgreSQL
~~~~~~~~~~~~~~~~~~~~~

See :doc:`../guides/postgresql-migration` for complete migration guide.

**Summary:**

1. Install PostgreSQL and run Alembic migrations
2. Export DynamoDB data: ``python scripts/migrate_dynamodb_to_postgresql.py export``
3. Import to PostgreSQL: ``python scripts/migrate_dynamodb_to_postgresql.py import``
4. Validate: ``python scripts/migrate_dynamodb_to_postgresql.py validate``
5. Update configuration: ``DATABASE_BACKEND=postgresql``
6. Deploy and verify

**Downtime**: Can be zero with blue-green deployment strategy.

PostgreSQL → DynamoDB
~~~~~~~~~~~~~~~~~~~~~

Reverse migration is possible but requires custom export logic:

1. Export PostgreSQL to JSON (use ``pg_dump`` or custom script)
2. Transform data to DynamoDB format (handle composite keys)
3. Import using boto3 batch writes
4. Enable DynamoDB TTL on attributes table
5. Update configuration: ``DATABASE_BACKEND=dynamodb``

Implementation Details
----------------------

Table Schema Mapping
~~~~~~~~~~~~~~~~~~~~~

ActingWeb uses the same logical schema for both backends:

.. list-table::
   :header-rows: 1
   :widths: 20 30 30 20

   * - Table
     - Primary Key
     - DynamoDB Implementation
     - PostgreSQL Implementation
   * - actors
     - ``id``
     - Hash key: ``id``
     - Primary key: ``id``
   * - properties
     - ``(id, name)``
     - Hash key: ``id``, Range: ``name``
     - Composite key: ``(id, name)``
   * - attributes
     - ``(id, bucket_name)``
     - Hash key: ``id``, Range: ``bucket_name``
     - Composite key: ``(id, bucket_name)``
   * - trusts
     - ``(id, peerid)``
     - Hash key: ``id``, Range: ``peerid``
     - Composite key: ``(id, peerid)``
   * - peertrustees
     - ``(id, peerid)``
     - Hash key: ``id``, Range: ``peerid``
     - Composite key: ``(id, peerid)``
   * - subscriptions
     - ``(id, peer_sub_id)``
     - Hash key: ``id``, Range: ``peer_sub_id``
     - Composite key: ``(id, peer_sub_id)``
   * - subscription_diffs
     - ``(id, subid_seqnr)``
     - Hash key: ``id``, Range: ``subid_seqnr``
     - Composite key: ``(id, subid_seqnr)``

Index Strategy
~~~~~~~~~~~~~~

**DynamoDB:**

- Global Secondary Indexes (GSI) for non-key lookups:
  - ``actors``: GSI on ``creator`` field
  - ``properties``: GSI on ``value`` field (reverse lookups)
  - ``trusts``: GSI on ``secret`` field (token lookups)

**PostgreSQL:**

- B-tree indexes on frequently queried columns:
  - ``CREATE INDEX idx_actors_creator ON actors(creator)``
  - ``CREATE INDEX idx_properties_value ON properties(value)``
  - ``CREATE INDEX idx_trusts_secret ON trusts(secret)``
  - ``CREATE INDEX idx_attributes_ttl ON attributes(ttl_timestamp)`` (for cleanup)

Connection Management
~~~~~~~~~~~~~~~~~~~~~

**DynamoDB:**

- Stateless HTTP API (no connections to manage)
- PynamoDB handles retries and exponential backoff
- Works well with AWS Lambda (cold starts not affected)

**PostgreSQL:**

- Connection pooling via psycopg3 ``ConnectionPool``
- Default pool: min=2, max=10 connections
- Configure via environment variables:
  - ``PG_POOL_MIN_SIZE``
  - ``PG_POOL_MAX_SIZE``
  - ``PG_POOL_TIMEOUT``
- For AWS Lambda: Consider RDS Proxy for connection management

Data Types
~~~~~~~~~~

**JSON Storage:**

- DynamoDB: Native map/list types
- PostgreSQL: JSONB column type (efficient binary storage)

**Timestamps:**

- Both: Unix epoch as BIGINT for consistency
- PostgreSQL: Could use native TIMESTAMP but uses BIGINT for compatibility

**Composite Keys:**

- DynamoDB: Separate hash and range key fields
- PostgreSQL: Concatenated with ``:`` separator (e.g., ``peerid:subid``)

Recommendations by Scenario
----------------------------

Serverless/Lambda Deployment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Recommend: DynamoDB**

- Zero cold start impact (HTTP API)
- Automatic scaling with Lambda concurrency
- No connection management needed
- Pay-per-use aligns with Lambda pricing model

Traditional Server Deployment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Recommend: PostgreSQL**

- Better cost efficiency for steady load
- Simpler connection management with long-lived processes
- Lower latency with persistent connections
- Easier debugging with SQL query logs

Multi-Region Deployment
~~~~~~~~~~~~~~~~~~~~~~~

**Recommend: DynamoDB Global Tables**

- Built-in multi-region replication
- Automatic conflict resolution
- <100ms cross-region latency
- PostgreSQL multi-region requires manual setup (streaming replication, logical replication)

High-Traffic Application (>1M requests/day)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Recommend: PostgreSQL** (if predictable traffic)

- Fixed monthly cost
- Better cost efficiency at scale
- Lower per-request latency
- Connection pooling handles high concurrency

**Recommend: DynamoDB** (if variable/spiky traffic)

- Automatic scaling for burst traffic
- No capacity planning needed
- Pay only for actual usage

Development/Testing
~~~~~~~~~~~~~~~~~~~

**Either backend works well:**

- DynamoDB Local: ``docker run -p 8000:8000 amazon/dynamodb-local``
- PostgreSQL: ``docker run -p 5432:5432 postgres:16-alpine``
- Both support rapid iteration and testing

See Also
--------

- :doc:`../quickstart/configuration` - Configuration reference for both backends
- :doc:`../guides/postgresql-migration` - DynamoDB to PostgreSQL migration guide
- :doc:`../guides/database-maintenance` - TTL and cleanup configuration
- :doc:`../quickstart/local-dev-setup` - Local development setup
- :doc:`../contributing/backend-testing` - Testing strategy for dual backends