top of page
Search

Cloud-based RDS managent on AWS

  • Writer: Mark Kendall
    Mark Kendall
  • 3 days ago
  • 4 min read

Think of it as self-service with guardrails. You want teams to move fast on day-to-day DB work (tables, indexes, connections) without a ticket queue, while the platform enforces safety, cost, and compliance.


Operating Model (in one glance)

• Golden paths per engine (Postgres, Oracle, SQL Server, DynamoDB) with ready-made templates.

• Everything via Git (GitOps): schemas, migrations, access, and infra are PR-driven.

• Short-lived credentials + network policy + audit by default.

• Observability baked in (Performance Insights/pg_stat_statements, CloudWatch, alarms).



1) Provisioning & Access (self-serve)


Platform provides:

• Terraform/CDK modules for each engine (RDS/Aurora, Oracle, SQL Server, DynamoDB) with:

• subnet groups, security groups, parameter groups

• backups, PITR, deletion protection

• RDS Proxy / PgBouncer for connection pooling (Lambdas, K8s)

• Secrets Manager (automatic rotation) or IAM DB Auth (Postgres/MySQL)

• Backstage/AWS Service Catalog forms to instantiate DBs by team/env.


Teams do:

• Create a repo db-<domain> with:

/schema (DDL in migrations)

/access (who can connect; least-privilege policies)

/observability (dashboards/alerts)


Connection patterns:

• Lambdas: role → RDS Proxy → DB, secret fetched at runtime.

• K8s: IRSA service account → secret mount/env → PgBouncer/RDS Proxy.

• Human access: SSO to a bastion/SSM Session Manager + short-lived creds; no static users.



2) Day-to-Day Changes (tables, indexes, procs)


Rule: changes land via migrations, not consoles.


Migration tool (pick one): Flyway or Liquibase.


Workflow

1. Dev writes migration: V2025_09_04__add_order_idx.sql

2. PR runs CI dry-run against an ephemeral DB (or dev replica):

• lints DDL (no full-table locks on big tables, naming conventions)

• runs EXPLAIN checks for new indexes

• verifies rollbacks (where feasible)

3. On merge, CD applies migrations to devstageprod with approvals.


Example Flyway layout


/migrations

  └─ V2025_09_04__create_orders.sql

  └─ V2025_09_04__add_orders_customer_idx.sql


Example GitHub Actions (Postgres)


name: db-migrations

on: [pull_request, push]

jobs:

  validate:

    runs-on: ubuntu-latest

    steps:

      - uses: actions/checkout@v4

      - uses: docker://flyway/flyway:10

        with:

          args: >

            -url=jdbc:postgresql://ephemeral-db:5432/app

            -user=${{ secrets.DB_USER }}

            -password=${{ secrets.DB_PASS }}

            -locations=filesystem:migrations

            -connectRetries=10

            info validate migrate -target=latest -dryRunOutput=flyway.sql




3) Indexing & Performance Governance

• Naming convention: ix_<table>_<cols>, cap N indexes/table unless justified.

• Automated checks:

• reject duplicate/unused indexes via pg_stat_user_indexes

• require EXPLAIN (ANALYZE, BUFFERS) artifacts for queries tied to new indexes

• Insights: enable Performance Insights + pg_stat_statements; weekly report of top SQL.

• Safety: large tables require ONLINE/CONCURRENT index creation (Postgres CREATE INDEX CONCURRENTLY).



4) Multi-Engine Notes


Postgres/Aurora

• Prefer Aurora Serverless v2 for elastic capacity.

• Use IAM auth where possible; else Secrets Manager rotation.

• Read replicas for analytics; logical replication if needed.


Oracle / SQL Server (RDS)

• Standardize on Liquibase (good vendor support).

• Parameter groups enforce audit/tracing; AWR/Extended Events to CloudWatch Logs.


DynamoDB

• Changes via IaC only:

• table, GSIs/LSIs defined in Terraform/CDK

• on-demand for spiky loads; PITR on

• Data access by fine-grained IAM; add a small “table access” module teams can include.

• Model changes are code changes (no migrations), but enforce GSIs review in PR.



5) Connectivity Without Central Tickets

• Network: VPC endpoints + SGs managed by module; teams declare allowed SGs in PR (access/allowlist.tf).

• AuthZ: teams request roles via code (access/app-roles.tf) mapped to:

db_readonly, db_writer, db_admin (admin reserved)

• Secrets: db/<env>/<service> path; rotation policy auto-applies.

• Catalog: register each schema in the data catalog (owner, PII flags, retention).



6) Observability & SRE Basics

• Dashboards per DB with:

• CPU/IOPS/latency, active connections, deadlocks, top queries

• Alarms:

• connection saturation, replication lag, disk space, slow query budget

• Runbooks in the repo (/runbooks): hot index creation, vacuum bloat, failover steps.



7) “Golden Templates” (copy/paste starters)


Terraform: Postgres DB + RDS Proxy (excerpt)


module "aurora_pg" {

  source  = "modules/aurora-postgres"

  name    = "catalog"

  engine_version = "15"

  vpc_id  = var.vpc_id

  subnets = var.db_subnets

  backup_retention_days = 7

  performance_insights  = true

}


module "rds_proxy" {

  source         = "modules/rds-proxy"

  name           = "catalog-proxy"

  target_cluster = module.aurora_pg.cluster_arn

  secrets_manager_arn = module.aurora_pg.secret_arn

  iam_auth       = true

}


Liquibase migration (Oracle/SQL Server flavor)


-- changeset team:2025-09-04-create-customer

CREATE TABLE CUSTOMER (

  ID VARCHAR2(36) PRIMARY KEY,

  NAME VARCHAR2(200) NOT NULL,

  CREATED_AT TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);


DynamoDB table (CDK, TypeScript)


new dynamodb.Table(this, "Orders", {

  tableName: "orders",

  partitionKey: { name: "orderId", type: dynamodb.AttributeType.STRING },

  billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,

  pointInTimeRecovery: true,

});




What this gives you

• Teams create/alter their own schemas with speed (PRs → pipelines), no DBA bottleneck.

• Platform ensures security, resilience, cost control, and auditability.

• Works uniformly across Postgres, Oracle, SQL Server, and DynamoDB.

 
 
 

Recent Posts

See All

Comments


Post: Blog2_Post

Subscribe Form

Thanks for submitting!

©2020 by LearnTeachMaster DevOps. Proudly created with Wix.com

bottom of page