PostgreSQL Migration Guide

This guide explains how to migrate from DynamoDB to PostgreSQL for existing ActingWeb deployments.

Prerequisites

  • PostgreSQL 12 or higher

  • Python 3.11+

  • ActingWeb with PostgreSQL extras installed: poetry install --extras postgresql

  • Access to both DynamoDB and PostgreSQL databases during migration

Migration Overview

The migration process involves:

  1. Setup: Install PostgreSQL backend and run migrations

  2. Export: Extract data from DynamoDB

  3. Import: Load data into PostgreSQL

  4. Validate: Verify data integrity

  5. Switchover: Update configuration to use PostgreSQL

  6. Cleanup: (Optional) Archive or delete DynamoDB data

Estimated time: 2-6 hours depending on data volume

Step 1: PostgreSQL Setup

1.1 Install PostgreSQL

macOS (Homebrew):

brew install postgresql@16
brew services start postgresql@16

Ubuntu/Debian:

sudo apt update
sudo apt install postgresql-16 postgresql-contrib
sudo systemctl start postgresql

Docker (Development):

docker run -d \
  --name actingweb-postgres \
  -e POSTGRES_USER=actingweb \
  -e POSTGRES_PASSWORD=secretpassword \
  -e POSTGRES_DB=actingweb \
  -p 5432:5432 \
  postgres:16-alpine

1.2 Create Database and User

-- Connect as postgres superuser
psql -U postgres

-- Create user and database
CREATE USER actingweb WITH PASSWORD 'secretpassword';
CREATE DATABASE actingweb OWNER actingweb;
GRANT ALL PRIVILEGES ON DATABASE actingweb TO actingweb;

-- Exit psql
\q

1.3 Install Python Dependencies

# Install PostgreSQL extras
poetry install --extras postgresql

# Verify installation
poetry run python -c "import psycopg; import alembic; print('PostgreSQL dependencies installed')"

1.4 Configure Environment

Development (.env):

# Backend selection
DATABASE_BACKEND=postgresql

# PostgreSQL connection
PG_DB_HOST=localhost
PG_DB_PORT=5432
PG_DB_NAME=actingweb
PG_DB_USER=actingweb
PG_DB_PASSWORD=secretpassword

# Optional: Schema prefix for testing
PG_DB_PREFIX=

Production (systemd/Docker):

export DATABASE_BACKEND=postgresql
export PG_DB_HOST=postgres.example.com
export PG_DB_PORT=5432
export PG_DB_NAME=actingweb_prod
export PG_DB_USER=actingweb
export PG_DB_PASSWORD=<secure-password>

1.5 Run Alembic Migrations

cd actingweb/db/postgresql/
alembic upgrade head

Verify tables created:

psql -U actingweb -d actingweb -c "\dt"

Expected output:

              List of relations
 Schema |        Name         | Type  |   Owner
--------+---------------------+-------+-----------
 public | actors              | table | actingweb
 public | alembic_version     | table | actingweb
 public | attributes          | table | actingweb
 public | peertrustees        | table | actingweb
 public | properties          | table | actingweb
 public | subscription_diffs  | table | actingweb
 public | subscriptions       | table | actingweb
 public | trusts              | table | actingweb

Step 2: Export DynamoDB Data

2.2 Manual Export (Advanced)

Export single table:

import json
from pynamodb.models import Model

# Export actors
actors = []
for actor in ActorModel.scan():
    actors.append(actor.to_dict())

with open('actors.json', 'w') as f:
    json.dump(actors, f, indent=2)

Step 3: Import to PostgreSQL

3.1 Use Migration Script (Recommended)

# Import data from JSON
poetry run python scripts/migrate_dynamodb_to_postgresql.py import \
  --input-dir /tmp/actingweb_export \
  --pg-host localhost \
  --pg-port 5432 \
  --pg-database actingweb \
  --pg-user actingweb \
  --pg-password secretpassword

# Verify import
poetry run python scripts/migrate_dynamodb_to_postgresql.py validate \
  --input-dir /tmp/actingweb_export \
  --pg-host localhost \
  --pg-port 5432 \
  --pg-database actingweb \
  --pg-user actingweb \
  --pg-password secretpassword

3.2 Verify Row Counts

-- Check row counts
SELECT 'actors' AS table_name, COUNT(*) FROM actors
UNION ALL
SELECT 'properties', COUNT(*) FROM properties
UNION ALL
SELECT 'attributes', COUNT(*) FROM attributes
UNION ALL
SELECT 'trusts', COUNT(*) FROM trusts
UNION ALL
SELECT 'peertrustees', COUNT(*) FROM peertrustees
UNION ALL
SELECT 'subscriptions', COUNT(*) FROM subscriptions
UNION ALL
SELECT 'subscription_diffs', COUNT(*) FROM subscription_diffs;

Compare with DynamoDB counts.

Step 4: Validation

4.1 Automated Validation

The migration script includes validation:

poetry run python scripts/migrate_dynamodb_to_postgresql.py validate \
  --input-dir /tmp/actingweb_export \
  --pg-host localhost \
  --pg-database actingweb

Checks performed:

  • Row count matches

  • Primary key uniqueness

  • Foreign key integrity (actor references)

  • Data type conversions (JSON, timestamps)

  • Sample record comparison

4.2 Manual Verification

Check specific actor:

-- Get actor by ID
SELECT * FROM actors WHERE id = 'test-actor-123';

-- Get actor's properties
SELECT * FROM properties WHERE id = 'test-actor-123';

-- Get actor's trusts
SELECT * FROM trusts WHERE id = 'test-actor-123';

Compare with DynamoDB:

from actingweb.db.dynamodb.actor import DbActor

db_actor = DbActor()
db_actor.get('test-actor-123')
print(db_actor.handle)

4.3 Integration Test

Run integration tests against PostgreSQL:

# Set PostgreSQL as backend
export DATABASE_BACKEND=postgresql
export PG_DB_HOST=localhost
export PG_DB_PORT=5432
export PG_DB_NAME=actingweb_test
export PG_DB_USER=actingweb
export PG_DB_PASSWORD=testpassword

# Run tests
poetry run pytest tests/integration/ -v

All tests should pass.

Step 5: Switchover

5.2 Blue-Green Deployment

Best for: High-availability production

  1. Deploy new PostgreSQL-backed instances (green environment)

  2. Import data to PostgreSQL

  3. Route subset of traffic to green environment (e.g., 10%)

  4. Monitor for issues

  5. Gradually increase traffic to green environment

  6. Switch DNS/load balancer to green environment

  7. Decommission old DynamoDB instances (blue environment)

5.3 Configuration Updates

Update systemd service:

[Service]
Environment="DATABASE_BACKEND=postgresql"
Environment="PG_DB_HOST=postgres.example.com"
Environment="PG_DB_PORT=5432"
Environment="PG_DB_NAME=actingweb_prod"
Environment="PG_DB_USER=actingweb"
Environment="PG_DB_PASSWORD=<secure-password>"

Update Docker Compose:

services:
  app:
    environment:
      - DATABASE_BACKEND=postgresql
      - PG_DB_HOST=postgres
      - PG_DB_PORT=5432
      - PG_DB_NAME=actingweb
      - PG_DB_USER=actingweb
      - PG_DB_PASSWORD=secretpassword
    depends_on:
      postgres:
        condition: service_healthy

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: actingweb
      POSTGRES_PASSWORD: secretpassword
      POSTGRES_DB: actingweb
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U actingweb"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  postgres_data:

Update Kubernetes:

apiVersion: v1
kind: ConfigMap
metadata:
  name: actingweb-config
data:
  DATABASE_BACKEND: "postgresql"
  PG_DB_HOST: "postgres-service"
  PG_DB_PORT: "5432"
  PG_DB_NAME: "actingweb"
---
apiVersion: v1
kind: Secret
metadata:
  name: actingweb-secrets
type: Opaque
stringData:
  PG_DB_USER: "actingweb"
  PG_DB_PASSWORD: "<secure-password>"

Step 6: Post-Migration

6.1 Monitoring

PostgreSQL connection pooling:

-- Check active connections
SELECT count(*) FROM pg_stat_activity WHERE datname = 'actingweb';

Query performance:

-- Enable query logging (postgresql.conf)
log_statement = 'all'
log_duration = on

-- Or use pg_stat_statements extension
CREATE EXTENSION pg_stat_statements;
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;

6.2 Backup Setup

pg_dump (simple):

# Daily backup
pg_dump -U actingweb actingweb | gzip > actingweb_$(date +%Y%m%d).sql.gz

# Cron job
0 2 * * * pg_dump -U actingweb actingweb | gzip > /backups/actingweb_$(date +\%Y\%m\%d).sql.gz

Continuous archiving (production):

# Enable WAL archiving in postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'

# Use pgBackRest or Barman for point-in-time recovery

6.3 Performance Tuning

Connection pooling (already configured in connection.py):

export PG_POOL_MIN_SIZE=2
export PG_POOL_MAX_SIZE=20
export PG_POOL_TIMEOUT=30

PostgreSQL tuning (postgresql.conf):

# Memory
shared_buffers = 256MB           # 25% of RAM
effective_cache_size = 1GB       # 50-75% of RAM
work_mem = 16MB                  # Per connection sort memory

# Connections
max_connections = 100

# Query planner
random_page_cost = 1.1           # For SSD storage
effective_io_concurrency = 200   # For SSD storage

6.4 Cleanup (Optional)

After confirming PostgreSQL is working correctly:

Archive DynamoDB data:

# Export to S3 for archival
aws dynamodb scan --table-name prod_actors \
  | gzip > s3://backups/dynamodb/actors_$(date +%Y%m%d).json.gz

Delete DynamoDB tables:

# WARNING: Irreversible!
aws dynamodb delete-table --table-name prod_actors
aws dynamodb delete-table --table-name prod_properties
# ... etc

Rollback Procedures

Quick Rollback (if issues detected early)

# 1. Stop application
systemctl stop actingweb

# 2. Revert configuration
export DATABASE_BACKEND=dynamodb
export AWS_DB_HOST=http://dynamodb.us-west-2.amazonaws.com
export AWS_DB_PREFIX=prod_

# 3. Restart application
systemctl start actingweb

# 4. Verify DynamoDB is still accessible

Data Recovery (if PostgreSQL data corrupted)

# 1. Drop PostgreSQL database
psql -U postgres -c "DROP DATABASE actingweb;"
psql -U postgres -c "CREATE DATABASE actingweb OWNER actingweb;"

# 2. Re-run migrations
cd actingweb/db/postgresql/
alembic upgrade head

# 3. Re-import from exported JSON
poetry run python scripts/migrate_dynamodb_to_postgresql.py import \
  --input-dir /tmp/actingweb_export

# 4. Validate
poetry run python scripts/migrate_dynamodb_to_postgresql.py validate \
  --input-dir /tmp/actingweb_export

Common Issues

Issue: Connection refused

Symptom:

psycopg.OperationalError: connection to server at "localhost" (::1), port 5432 failed: Connection refused

Solution:

# Check PostgreSQL is running
pg_isready -h localhost -p 5432

# Start PostgreSQL
sudo systemctl start postgresql   # Linux
brew services start postgresql@16 # macOS

# Check pg_hba.conf allows connections
sudo vi /etc/postgresql/16/main/pg_hba.conf
# Add: host  all  actingweb  127.0.0.1/32  md5
sudo systemctl reload postgresql

Issue: Authentication failed

Symptom:

psycopg.OperationalError: FATAL: password authentication failed for user "actingweb"

Solution:

# Reset password
psql -U postgres -c "ALTER USER actingweb WITH PASSWORD 'newsecretpassword';"

# Update environment variable
export PG_DB_PASSWORD=newsecretpassword

Issue: Schema already exists

Symptom:

alembic.util.exc.CommandError: Target database is not up to date.

Solution:

# Check current version
cd actingweb/db/postgresql/
alembic current

# Stamp database with current version
alembic stamp head

# Or drop and recreate
psql -U actingweb -d actingweb -c "DROP SCHEMA public CASCADE; CREATE SCHEMA public;"
alembic upgrade head

Issue: Migration script fails

Symptom:

ERROR: Failed to import properties: duplicate key value violates unique constraint

Solution:

# Check for duplicate data in export
jq '.[] | .id + ":" + .name' /tmp/actingweb_export/properties.json | sort | uniq -d

# Clean duplicates before import
poetry run python scripts/migrate_dynamodb_to_postgresql.py import \
  --input-dir /tmp/actingweb_export \
  --skip-duplicates \
  --log-level DEBUG

Issue: Performance degradation

Symptom: Queries slower than DynamoDB

Solution:

-- Check missing indexes
SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE schemaname = 'public';

-- Add missing indexes (already in Alembic migrations)
-- Analyze query plans
EXPLAIN ANALYZE SELECT * FROM trusts WHERE id = 'actor-123';

-- Update table statistics
ANALYZE actors;
ANALYZE properties;
-- ... etc

-- Vacuum tables
VACUUM ANALYZE;

Performance Comparison

Typical performance characteristics:

Operation

DynamoDB

PostgreSQL

Notes

Single actor read

5-10ms

1-3ms

PostgreSQL faster (local)

Property read

5-10ms

1-2ms

PostgreSQL faster

Trust query

10-20ms

2-5ms

PostgreSQL faster (JOIN support)

Batch operations

Good

Excellent

PostgreSQL supports transactions

Full table scan

Poor

Good

PostgreSQL has better filtering

Concurrent writes

Excellent

Good

DynamoDB has better write scaling

PostgreSQL advantages:

  • Lower latency for single reads (no network overhead)

  • Complex queries with JOINs

  • Transaction support (ACID guarantees)

  • Lower cost for read-heavy workloads

  • Mature backup/restore tools

DynamoDB advantages:

  • Automatic scaling

  • Multi-region replication

  • Better for extremely high write throughput (>10k writes/sec)

  • Managed service (no server maintenance)

Support

For migration assistance:

Appendix: Schema Reference

PostgreSQL Schema

actors table:

CREATE TABLE actors (
    id VARCHAR(255) PRIMARY KEY,
    creator VARCHAR(255),
    passphrase TEXT
);
CREATE INDEX idx_actors_creator ON actors(creator);

properties table:

CREATE TABLE properties (
    id VARCHAR(255),
    name VARCHAR(255),
    value TEXT,
    PRIMARY KEY (id, name),
    FOREIGN KEY (id) REFERENCES actors(id) ON DELETE CASCADE
);
CREATE INDEX idx_properties_value ON properties(value);

trusts table:

CREATE TABLE trusts (
    id VARCHAR(255),
    peerid VARCHAR(255),
    baseuri VARCHAR(255),
    type VARCHAR(50),
    relationship VARCHAR(50),
    secret VARCHAR(255),
    -- ... additional fields
    PRIMARY KEY (id, peerid),
    FOREIGN KEY (id) REFERENCES actors(id) ON DELETE CASCADE
);
CREATE INDEX idx_trusts_secret ON trusts(secret);

See actingweb/db/postgresql/schema.py for complete schema definitions.