Self-Hosted Edition

CompliTru
Documentation

Architecture, security, deployment, and operations reference for the customer-owned CompliTru deployment.

Product
CompliTru Self-Hosted
Version
1.0.0
Last Updated
April 15, 2026
Documentation
docs.complitru.ai

Table of Contents

CompliTru Self-Hosted Documentation Package

  1. 01
    Product Overview
    CompliTru is a cloud security and compliance platform that closes the loop between finding misconfigurations and fixing them. Most securi...
  2. 02
    Architecture
    CompliTru is a three-tier application deployed entirely within your AWS account. All customer data, scan findings, and audit logs remain ...
  3. 03
    Security
    Security architecture, threat model, and hardening reference for the self-hosted CompliTru deployment.
  4. 04
    AI Provider Configuration
    CompliTru's AI features are deployment-configurable. Choose AWS Bedrock (default for self-hosted), OpenAI, or disable AI entirely. The pr...
  5. 05
    Configuration Reference
    Every tunable parameter, environment variable, and Terraform variable, with defaults, valid values, and where each is read.
  6. 06
    Deployment Runbook
    End-to-end deployment of CompliTru into your AWS account using the included Terraform package. First-time deployment takes ~45 minutes.
  7. 07
    Operations Guide
    Day-2 operations: monitoring, troubleshooting, scaling, common runbooks. Targets the platform engineer who keeps the deployment healthy.
  8. 08
    Upgrade Guide
    How to upgrade your CompliTru deployment to a new version safely with zero data loss and minimal downtime.
  9. 09
    SOC 2 Control Mapping
    How the CompliTru self-hosted deployment satisfies SOC 2 Trust Services Criteria. This document is structured for use as auditor evidence...
Section 01

Product Overview

CompliTru Self-Hosted — Product Overview

What CompliTru does

CompliTru is a cloud security and compliance platform that closes the loop between finding misconfigurations and fixing them. Most security tools surface findings and stop. CompliTru continues — assessing blast radius, generating remediation, and applying fixes (with approval) inside the same workflow.

The self-hosted edition runs entirely inside your AWS account. No customer data leaves your VPC. AI features are deployment-configurable: AWS Bedrock in your account, your own OpenAI key, or fully disabled.

Capability summary

Capability What it does
Cloud configuration scanning Scans AWS, Azure, and GCP for misconfigurations across 600+ checks mapped to CIS, NIST, SOC 2, ISO 27001, HIPAA, PCI DSS
Vulnerability assessment Aggregates findings from cloud-native scanners (Inspector, Defender, Security Command Center) plus CompliTru's own scan engine
Impact-aware remediation Before suggesting a fix, evaluates blast radius — what depends on this resource, what breaks if changed, what running workloads are affected
Automated remediation One-click apply for safe fixes; multi-step playbooks for complex changes; full audit trail of every change
Compliance reporting Live mapping of findings to SOC 2, ISO 27001, NIST CSF, CIS Benchmarks, HIPAA, PCI DSS controls
AI-powered analysis Optional LLM-driven remediation suggestions, finding summarization, risk narration, and natural-language querying
Cost optimization Right-sizing recommendations that check workload dependencies before suggesting downgrades
Identity and access analysis CIEM — analyzes IAM policies for over-privileged identities, unused permissions, privilege escalation paths
Multi-account / multi-cloud Cross-account roles for AWS organizations; equivalent setups for Azure tenants and GCP organizations

Feature catalog

Scanning engine

Remediation engine

AI features (deployment-configurable)

AI providers supported (configured per deployment): - AWS Bedrock (Claude Sonnet 4 / Haiku 3.5) — default for self-hosted - OpenAI (GPT-4o, GPT-4o-mini) — requires customer-provided API key - Disabled — all AI features turn off cleanly; product remains fully functional without AI

See AI_PROVIDER.md for configuration details.

Compliance and reporting

See SOC2_CONTROLS.md for the full SOC 2 control mapping.

Identity and access (CIEM)

Cost optimization

Notifications and integrations

Multi-tenancy and RBAC

Operational features

What's in the box (deployment artifacts)

schellman/
├── README.md                       Quick start, requirements, package contents
├── docs/
│   ├── PRODUCT_OVERVIEW.md         This document
│   ├── ARCHITECTURE.md             System architecture, components, data flow
│   ├── SECURITY.md                 Security controls, threat model, hardening
│   ├── DEPLOYMENT.md               Step-by-step deployment runbook
│   ├── CONFIGURATION.md            All environment variables and tunables
│   ├── AI_PROVIDER.md              OpenAI / Bedrock / disabled configuration
│   ├── OPERATIONS.md               Logs, metrics, troubleshooting, day-2 ops
│   ├── SOC2_CONTROLS.md            SOC 2 control-by-control mapping
│   └── UPGRADE.md                  Version upgrade process
├── docker/
│   ├── Dockerfile.backend          Flask API + Gunicorn (Python 3.12)
│   ├── Dockerfile.frontend         Next.js 14 SSR (Node 20)
│   ├── Dockerfile.worker           Celery worker for async jobs
│   ├── docker-compose.yml          Local verification stack
│   └── .dockerignore
├── terraform/
│   ├── main.tf                     Root module composing all infrastructure
│   ├── variables.tf                All tunable parameters
│   ├── outputs.tf                  URLs, ARNs, connection info
│   ├── versions.tf                 Provider pinning
│   ├── example.tfvars              Example values
│   └── modules/
│       ├── networking/             VPC, subnets, NAT gateways, security groups
│       ├── rds/                    MySQL RDS with encryption, backups, parameter group
│       ├── ecs/                    Fargate cluster, task definitions, services, IAM roles
│       ├── secrets/                Secrets Manager entries for app secrets
│       └── loadbalancer/           ALB with ACM cert integration, optional WAF
└── scripts/
    ├── bootstrap.sh                Pre-deployment setup (ACM cert, S3 backend)
    ├── migrate.sh                  Run DB migrations after deployment
    └── verify-deployment.sh        Post-deploy smoke tests

Supported deployment targets

Target Status Notes
AWS ECS Fargate (recommended) ✅ Production-ready Terraform modules included
AWS EKS ✅ Helm chart available on request Deployment via Helm + values file
AWS EC2 + Docker Compose ✅ Reference compose included For dev/test/POC environments
Standalone Kubernetes (on-prem) ✅ Helm chart available on request Requires customer-provided MySQL and Redis
Azure (AKS / Azure Container Apps) 🚧 Roadmap (Q3 2026) Equivalent Terraform modules planned
GCP (GKE / Cloud Run) 🚧 Roadmap (Q4 2026) Equivalent Terraform modules planned

What CompliTru is not

To set expectations clearly:

Differentiation

The CompliTru differentiator versus other CSPM/CNAPP tools (Wiz, Lacework, Orca, Prisma Cloud):

  1. Closed-loop remediation. Most tools surface findings and stop. CompliTru includes the fix, the impact assessment, the rollback, and the audit trail in the same workflow.
  2. Impact-aware fixes. Before applying a remediation, CompliTru evaluates blast radius — what depends on this, what breaks if changed, what's running. Other tools recommend fixes without context.
  3. Customer-owned deployment. CompliTru runs in your AWS account. Your data never leaves. Most competitors are SaaS-only with customer data flowing to the vendor.
  4. AI-optional architecture. AI is a feature, not a dependency. Disable it entirely and the product still works. Most "AI-powered" platforms collapse without their LLM.

For deeper dives: - System architecture and data flow → ARCHITECTURE.md - Security controls and threat model → SECURITY.md - Deployment runbook → DEPLOYMENT.md - Configuration reference → CONFIGURATION.md - AI provider configuration → AI_PROVIDER.md - SOC 2 control mapping → SOC2_CONTROLS.md

Section 02

Architecture

CompliTru Self-Hosted — Architecture

System overview

CompliTru is a three-tier application deployed entirely within your AWS account. All customer data, scan findings, and audit logs remain in your infrastructure at all times.

CompliTru Self-Hosted Architecture

Components

1. Load balancer

2. Frontend

3. Backend API

4. Celery worker

5. Database (RDS MySQL)

6. Cache / queue (ElastiCache Redis)

7. Secrets Manager

Single secret at path complitru/${environment}/app containing:

Rotation supported via native Secrets Manager rotation for DB credentials.

8. S3

Two buckets created:

Data flow

Authenticated request path

User → Route 53 → ALB (TLS) → Frontend (SSR) → API Backend → RDS / Redis
                                     ↓
                                 Secrets Manager (token signing)

Scan execution path

Scan Execution Data Flow

License validation path

Backend startup → License check against license.complitru.ai (HTTPS)
                ↓
        Every 60 minutes → Refresh license status
                ↓
        If offline > 7 days → Application fails closed (refuses scans)

Only outbound traffic. License server never initiates connections to your infrastructure.

Network security

Network Topology and Security Boundaries

Ingress

Egress

Flow logs

VPC flow logs enabled and shipped to CloudWatch Logs with 90-day retention.

Scaling

Horizontal

Vertical

Production defaults

Component Size Cost estimate
Backend ECS (2 tasks) 1 vCPU / 2 GB each ~$70/mo
Frontend ECS (2 tasks) 0.5 vCPU / 1 GB each ~$35/mo
Worker ECS (1 task) 1 vCPU / 2 GB ~$35/mo
RDS db.t3.medium Multi-AZ 2 vCPU / 4 GB ~$140/mo
ElastiCache t3.micro 2 nodes ~$25/mo
ALB + NAT + data ~$55/mo
Total ~$360/mo

Actual costs vary by region and traffic volume.

Disaster recovery

Observability

Alarms publish to an SNS topic you control — route to PagerDuty, Slack, Opsgenie, email, etc.

Future architecture (v2)

The current package is a fully self-hosted monolithic deployment. Roadmap includes:

  1. Split control plane / data plane — Detection rules and remediation logic move to a CompliTru-managed control plane, with a lightweight connector in your VPC. Reduces your operational burden, accelerates rule updates.
  2. EKS deployment option — Helm chart alternative for organizations standardized on Kubernetes.
  3. Azure + GCP support — Terraform modules for equivalent resources on Azure and GCP.

Current package is forward-compatible — migration path will be documented when v2 ships.

Section 03

Security

CompliTru Self-Hosted — Security

Security architecture, threat model, and hardening reference for the self-hosted CompliTru deployment.

Principles

  1. Customer data never leaves the customer's AWS account. No telemetry, no phone-home of scan data, no background uploads.
  2. Least privilege everywhere. IAM roles scoped per service. No wildcards in production policies.
  3. Defense in depth. Network isolation + IAM + application-layer authorization + audit logging, each independent.
  4. Fail closed. If license check fails or secrets cannot be retrieved, application refuses to start or scan. Never defaults to "open."

Threat model

Threat Control Residual risk
External attacker exploits web vulnerability WAF (optional), HTTPS only, security headers, input validation, CSP Low
Attacker with stolen user credentials MFA support, session timeout, audit log of every admin action, IP allow-listing option Low
Malicious insider with AWS console access CloudTrail captures all console activity, Object Lock on audit bucket prevents tampering Medium — mitigated by customer's own governance
Container escape Non-root execution, minimal base image, no privileged containers, read-only root filesystem where possible Low
RDS credential theft Secrets Manager rotation, TLS required for DB connections, no credentials in code or env Low
Supply chain attack on container images Images built from verified base, CVE scanning at build time, SBOM provided Medium — customer should pin image digests
DNS hijack / MITM HSTS enforced, TLS 1.2+, certificate pinning at ALB Low
Data exfiltration via compromised task VPC endpoints restrict S3/Secrets Manager access to your VPC, flow logs capture anomalies Low

Encryption

In transit

At rest

Container security

Base image

Runtime

Image verification

Image digests published at each release:

complitru/backend:1.0.0@sha256:...
complitru/frontend:1.0.0@sha256:...
complitru/worker:1.0.0@sha256:...

Pin to digests in your task definitions for reproducible deployments.

Application security

Authentication

Authorization

Input validation

Rate limiting

Audit logging

Every admin action, auth event, and data mutation logged to:

  1. Application DB audit_log table (queryable via UI)
  2. CloudWatch Logs (for SIEM integration)
  3. S3 audit bucket (Object Lock, 7-year retention)

Each entry includes: - Timestamp (UTC, ISO-8601) - Actor (user ID, API key ID, or system) - Source IP and user agent - Action (e.g., user.create, scan.start, finding.resolve) - Target resource ID - Before/after state for mutations - Request correlation ID

IAM policy structure

Four IAM roles created per deployment, each with tightly scoped permissions:

complitru-ecs-task-backend

secretsmanager:GetSecretValue  → complitru/* only
ssm:GetParameter               → /complitru/* only
s3:*                           → complitru-reports-* and complitru-audit-* only
sts:AssumeRole                 → customer-provided cross-account roles (for scans)
textract:DetectDocumentText    → * (needed for OCR feature)
bedrock:InvokeModel            → specific model ARNs only (if AI enabled)

complitru-ecs-task-worker

Same as backend, plus:
ses:SendEmail                  → identity-based restrictions

complitru-ecs-task-frontend

(no AWS permissions — frontend is read-only, proxies through backend)

complitru-rds-monitoring

AWS managed: AmazonRDSEnhancedMonitoringRole

Full IAM policy documents are in terraform/modules/ecs/iam.tf.

Secrets management

All secrets live in AWS Secrets Manager. No secrets in:

Rotation support:

Vulnerability management

CVE scanning at build

# Run as part of CI
trivy image complitru/backend:${VERSION} --exit-code 1 --severity CRITICAL,HIGH
trivy image complitru/frontend:${VERSION} --exit-code 1 --severity CRITICAL,HIGH
trivy image complitru/worker:${VERSION} --exit-code 1 --severity CRITICAL,HIGH

Release images must pass with zero CRITICAL/HIGH unresolved.

CVE scanning in your registry

Recommended: Enable Amazon ECR image scanning on the repositories where you mirror CompliTru images. ECR will notify on new CVEs discovered post-deploy.

Patching cadence

Notifications sent to the email tied to your license key.

Compliance hooks

Backups

Incident response

  1. Detect: CloudWatch alarms → SNS → your on-call channel
  2. Contain: ECS service scale-to-zero or ALB WAF rule block
  3. Investigate: CloudWatch Logs + application audit_log table + VPC flow logs
  4. Recover: Re-scale ECS services, rotate affected secrets, redeploy task definition if image compromised
  5. Review: Post-mortem process documented in docs/DEPLOYMENT.md#incident-response

CompliTru security team reachable at security@complitru.ai for coordinated disclosure on product vulnerabilities.

Responsible disclosure

Security researchers may disclose vulnerabilities in the CompliTru product to security@complitru.ai. PGP key available on request. Response SLA: 48 hours to initial acknowledgment, 30 days to fix or mitigation for critical issues.

Supply chain

CompliTru's own release process:

SBOMs for each release are included in the package under sbom/.

Section 04

AI Provider Configuration

CompliTru Self-Hosted — AI Provider Configuration

CompliTru's AI features are deployment-configurable. Choose AWS Bedrock (default for self-hosted), OpenAI, or disable AI entirely. The product is fully functional in all three modes — AI is an enhancement, not a dependency.

Modes

Mode Data flow Use when
Bedrock (recommended) All inference happens via Bedrock in your AWS account. No data leaves your AWS boundary. Default for self-hosted deployments; aligns with "no data to vendor" requirements
OpenAI Requests sent to OpenAI's API using your enterprise OpenAI key + BAA. CompliTru is never in the data path. You have an existing OpenAI enterprise contract and prefer GPT-4o-class models
Disabled All AI features hidden in UI; AI-powered endpoints return raw findings without LLM processing. AI not yet approved by your security team; air-gapped environments without Bedrock; cost reduction

Configuration

Bedrock (recommended)

  1. Enable model access in your AWS account (one-time, console-only) - Open the AWS Bedrock console in the region of your deployment - Navigate to Model accessManage model access - Request access to:

    • Anthropic Claude Sonnet 4 (anthropic.claude-sonnet-4-20250514-v1:0) — used for default and powerful
    • Anthropic Claude 3.5 Haiku (anthropic.claude-3-5-haiku-20241022-v1:0) — used for fast
    • Approval is automatic for most regions and accounts (under 5 minutes)
  2. Verify IAM permissions — the ECS task IAM role (complitru-ecs-task-backend and complitru-ecs-task-worker) needs:

json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream", "bedrock:Converse", "bedrock:ConverseStream" ], "Resource": [ "arn:aws:bedrock:*::foundation-model/anthropic.claude-sonnet-4-*", "arn:aws:bedrock:*::foundation-model/anthropic.claude-3-5-haiku-*", "arn:aws:bedrock:*:*:inference-profile/us.anthropic.claude-sonnet-4-*", "arn:aws:bedrock:*:*:inference-profile/us.anthropic.claude-3-5-haiku-*" ] } ] }

Provided automatically when you deploy via the included Terraform (terraform/modules/ecs/iam.tf).

  1. Set deployment configuration (Terraform variables):

hcl ai_enabled = true ai_provider = "bedrock" ai_region = "us-east-1" # Defaults to deployment region

Or set the corresponding environment variables on the ECS task:

AI_ENABLED=true AI_PROVIDER=bedrock AWS_REGION=us-east-1

  1. No API keys required. Bedrock uses the ECS task IAM role for authentication — no secrets to rotate, no keys to leak.

OpenAI

  1. Provision an OpenAI enterprise key with a BAA in place if you handle PHI.

  2. Store the key in Secrets Manager under the existing app secret:

bash aws secretsmanager update-secret \ --secret-id complitru/production/app \ --secret-string '{ "OPENAI_API_KEY": "sk-...", ...other existing secrets... }'

  1. Set deployment configuration:

hcl ai_enabled = true ai_provider = "openai"

Environment variable equivalent:

AI_ENABLED=true AI_PROVIDER=openai

  1. Optional — restrict model usage with an OpenAI organization-level policy if you want to limit which models the deployment can call (e.g., disallow GPT-4 to control cost).

Disabled

ai_enabled = false

Environment variable equivalent:

AI_ENABLED=false

When AI_ENABLED=false:

This mode is appropriate for: - Initial deployment while AI is going through internal security review - Air-gapped environments without Bedrock model access - Customers who prefer to avoid AI entirely - Cost-conscious deployments (AI accounts for ~10–30% of compute spend depending on usage)

Model selection

Three logical model tiers map to provider-specific model IDs:

Logical name Bedrock model OpenAI model When used
default Claude Sonnet 4 GPT-4o-mini Most AI calls (remediation suggestions, summarization)
fast Claude 3.5 Haiku GPT-4o-mini High-volume, lower-stakes operations (auto-classification, light summaries)
powerful Claude Sonnet 4 GPT-4o Complex reasoning (multi-step remediation playbooks, risk narration)

You can override per call site, but the logical names are the defaults. This abstraction means switching providers requires zero application code changes — just a configuration update.

Verification

After configuring, verify the AI provider is working:

# Inside an ECS exec session into the backend task:
python3 -c "
from app.ai.provider import get_provider
provider = get_provider()
print(f'Provider: {provider.provider_name}')
result = provider.chat_completion(
    messages=[{'role': 'user', 'content': 'Reply with one word: ok'}],
    max_tokens=10,
)
print(f'Response: {result.content}')
print(f'Tokens used: {result.usage[\"total_tokens\"]}')
"

Expected output:

Provider: bedrock
Response: ok
Tokens used: 12

If you see Bedrock AccessDenied, model access is not enabled in the Bedrock console (step 1 of Bedrock setup).

De-identification before AI processing

When AI is enabled and processing data that may contain PHI/PII (the de-identify pipeline at /api/deidentify/*):

  1. Inbound text is run through AWS Comprehend Medical to detect PHI entities
  2. Detected PHI is replaced with type-tagged tokens ([NAME_001], [DATE_002], [SSN_003], etc.)
  3. The de-identified text — containing only tokens — is sent to the configured AI provider
  4. The AI response is post-processed to swap tokens back to original PHI
  5. Token map is held in memory for the duration of the request, then destroyed
  6. Original PHI never reaches the AI provider — Bedrock or OpenAI

This pipeline runs server-side. Token maps are never persisted to disk or database, never sent to the AI provider, never leave the application memory of the request handler.

For deployments that handle PHI, this de-identification step is enforced at the route layer regardless of AI provider — it is not a configuration option to disable.

Cost considerations

Bedrock (your account)

Costs accrue directly in your AWS bill. As of April 2026:

Model Input price Output price
Claude Sonnet 4 $3.00 / 1M tokens $15.00 / 1M tokens
Claude 3.5 Haiku $0.80 / 1M tokens $4.00 / 1M tokens

Typical CompliTru AI usage at moderate scale (one mid-sized customer environment, daily scans, weekly compliance reports):

OpenAI

Costs accrue on your OpenAI organization account.

Model Input price Output price
GPT-4o $2.50 / 1M tokens $10.00 / 1M tokens
GPT-4o-mini $0.15 / 1M tokens $0.60 / 1M tokens

Typical usage cost: $15–$120/month at the same scale, since most calls hit default (GPT-4o-mini equivalent).

Disabled

Zero AI cost.

Switching providers

Switching between providers is a runtime configuration change — no application redeploy required:

  1. Update the AI_PROVIDER value in Secrets Manager (or Terraform variable + apply)
  2. Restart the ECS service (rolling restart, ~60 seconds, zero downtime)
  3. Verify with the verification command above

Cached responses are not invalidated — historical AI outputs in the database remain whatever they were originally generated with. New requests use the new provider.

Disabling AI per-feature

For finer-grained control beyond the global AI_ENABLED toggle, individual AI features can be disabled via Terraform variables or environment flags:

AI_FEATURE_REMEDIATION_SUGGESTIONS=true|false
AI_FEATURE_FINDING_SUMMARIZATION=true|false
AI_FEATURE_RISK_NARRATION=true|false
AI_FEATURE_NATURAL_LANGUAGE_QUERY=true|false
AI_FEATURE_AUTO_CLASSIFICATION=true|false

Defaults: all true when AI_ENABLED=true. Setting AI_ENABLED=false overrides all per-feature flags.

Audit logging of AI calls

Every LLM invocation is logged to the application audit log (audit_log table) with:

Prompt and response content are not logged by default to avoid retaining sensitive data. Enable AI_LOG_FULL_PAYLOADS=true in non-production environments only if you need to debug LLM behavior.

Provider abstraction reference

The provider abstraction lives in app/ai/provider.py. To call AI from custom code:

from app.ai.provider import get_provider

provider = get_provider()  # Returns configured provider (Bedrock or OpenAI)
result = provider.chat_completion(
    messages=[
        {"role": "system", "content": "You are a security analyst."},
        {"role": "user", "content": "Summarize this finding: ..."},
    ],
    model="default",          # logical name, mapped per provider
    temperature=0.3,
    max_tokens=1000,
)
print(result.content)         # str — the AI response text
print(result.usage)           # {"total_tokens": int}

Streaming, function/tool calling, and JSON mode are all supported and work identically across providers. See app/ai/provider.py source for the full interface.

Compliance and contractual notes

Section 05

Configuration Reference

CompliTru Self-Hosted — Configuration Reference

Every tunable parameter, environment variable, and Terraform variable, with defaults, valid values, and where each is read.

Configuration sources (in order of precedence)

  1. Environment variables on the ECS task (highest precedence)
  2. AWS Secrets Manager entries referenced by the ECS task definition
  3. Terraform variables that drive both of the above (managed in terraform/terraform.tfvars)
  4. Application defaults (lowest, defined in app/config/)

In production, you should manage configuration via Terraform — never edit env vars or secrets directly in the AWS console, as those changes will be overwritten on next terraform apply.

Critical secrets (Secrets Manager)

All entries below live in a single Secrets Manager secret at the path complitru/${environment}/app. Auto-created by the terraform/modules/secrets/ module.

Key Description Auto-generated? Rotation
DATABASE_URL RDS connection string (mysql+pymysql://user:pass@host:3306/dbname) ✅ Yes (from RDS module) Native Secrets Manager rotation
REDIS_URL ElastiCache endpoint (rediss://:auth@host:6379/0) ✅ Yes (from ElastiCache module) Manual
SECRET_KEY Flask session signing key (32+ bytes) ✅ Yes (Terraform random_password) Manual via Terraform; rotation invalidates active sessions
ENCRYPTION_KEY Fernet key for field-level encryption (32 url-safe base64 bytes) ✅ Yes Rotate with care — encrypts API keys in DB
JWT_SECRET API token signing key ✅ Yes Manual; rotation invalidates active API tokens
RESET_CODE_PEPPER Password reset code pepper (32+ bytes) ✅ Yes Manual; rotation invalidates outstanding reset tokens
COMPLITRU_LICENSE_KEY Your CompliTru license key (provided by CompliTru) ❌ Provide Yearly with subscription renewal
OPENAI_API_KEY Optional OpenAI API key (only if AI_PROVIDER=openai) ❌ Provide Per your OpenAI key rotation policy
SMTP_HOST, SMTP_PORT, SMTP_USER, SMTP_PASS, SMTP_FROM Outbound email config ❌ Provide if SMTP used Per your SMTP credentials policy
SES_FROM SES sender identity (if using SES instead of SMTP) ❌ Provide if SES used N/A

Application configuration (environment variables)

These are set on the ECS task definition (environment block) or via Terraform variables that map to them.

Core application

Variable Default Valid values Description
FLASK_ENV production production, development Flask environment. Set to production for self-hosted deployments.
FLASK_DEBUG 0 0, 1 Always 0 in production.
APP_PORT 1100 port number Backend listen port. ALB target group must match.
APP_BASE_URL (required) URL External base URL of the deployment, e.g., https://complitru.your-domain.com. Used in emails, OAuth callbacks, webhooks.
WORKERS 4 int Gunicorn worker count. Recommend 2x vCPUs.
WORKER_CLASS eventlet eventlet, sync, gevent Gunicorn worker class. eventlet required for Socket.IO.
WORKER_TIMEOUT 120 seconds Gunicorn worker timeout. Increase if you have long-running synchronous handlers.

AWS

Variable Default Description
AWS_REGION (required) Region of your deployment. Used by all boto3 clients unless overridden.
AWS_DEFAULT_REGION (mirrors AWS_REGION) boto3 fallback.

The deployment uses the ECS task IAM role for AWS authentication — no access keys required.

Database

Variable Default Description
DATABASE_URL (from Secrets Manager) Full RDS connection string.
SQLALCHEMY_POOL_SIZE 10 Connection pool size per worker.
SQLALCHEMY_MAX_OVERFLOW 20 Max overflow connections beyond pool.
SQLALCHEMY_POOL_RECYCLE 3600 Recycle connections after N seconds (avoid stale connections).
SQLALCHEMY_POOL_PRE_PING true Test connections before checkout.

Redis / Celery

Variable Default Description
REDIS_URL (from Secrets Manager) ElastiCache Redis URL with TLS (rediss://...).
CELERY_BROKER_URL (mirrors REDIS_URL with /0) Celery message broker.
CELERY_RESULT_BACKEND (mirrors REDIS_URL with /1) Celery result backend.
CELERY_TASK_TIME_LIMIT 1800 Hard timeout per task (seconds).
CELERY_TASK_SOFT_TIME_LIMIT 1500 Soft timeout (raises exception worker can catch).
CELERY_WORKER_CONCURRENCY 4 Concurrent tasks per worker process.
RATE_LIMIT_STORAGE_URI (mirrors REDIS_URL with /2) Rate limit counter storage.

AI provider

See AI_PROVIDER.md for full detail.

Variable Default Valid values Description
AI_ENABLED true true, false Master toggle for all AI features.
AI_PROVIDER bedrock bedrock, openai Which provider to use when AI_ENABLED=true.
BEDROCK_REGION (mirrors AWS_REGION) AWS region Region for Bedrock API calls.
BEDROCK_MODEL_DEFAULT us.anthropic.claude-sonnet-4-20250514-v1:0 model ID Override the default logical model.
BEDROCK_MODEL_FAST us.anthropic.claude-3-5-haiku-20241022-v1:0 model ID Override the fast logical model.
BEDROCK_MODEL_POWERFUL us.anthropic.claude-sonnet-4-20250514-v1:0 model ID Override the powerful logical model.
OPENAI_API_KEY (from Secrets Manager) string Required if AI_PROVIDER=openai.
AI_FEATURE_REMEDIATION_SUGGESTIONS true true, false Per-feature toggle.
AI_FEATURE_FINDING_SUMMARIZATION true true, false Per-feature toggle.
AI_FEATURE_RISK_NARRATION true true, false Per-feature toggle.
AI_FEATURE_NATURAL_LANGUAGE_QUERY true true, false Per-feature toggle.
AI_FEATURE_AUTO_CLASSIFICATION true true, false Per-feature toggle.
AI_LOG_FULL_PAYLOADS false true, false Logs full prompts/responses to CloudWatch. Non-production only.

Authentication

Variable Default Description
SECRET_KEY (from Secrets Manager) Flask session signing.
JWT_SECRET (from Secrets Manager) API token signing.
SESSION_TIMEOUT_HOURS 24 Session lifetime.
MFA_REQUIRED_FOR_ADMINS true Force TOTP MFA for admin role.
MAX_LOGIN_ATTEMPTS 5 Lockout threshold.
LOCKOUT_DURATION_MINUTES 15 Lockout duration after exceeded threshold.
PASSWORD_MIN_LENGTH 12 Minimum password length on signup/reset.
PASSWORD_REQUIRE_COMPLEXITY true Require uppercase, lowercase, digit, symbol.
OAUTHLIB_INSECURE_TRANSPORT 0 Always 0 in production. Allows OAuth over HTTP for local dev only.

Argon2 (password hashing)

Variable Default Description
ARGON2_TIME_COST 3 Iterations.
ARGON2_MEMORY_COST 65536 Memory in KiB.
ARGON2_PARALLELISM 2 Parallel threads.

Password reset

Variable Default Description
RESET_CODE_TTL_MINUTES 15 Code lifetime in minutes.
RESET_TOKEN_TTL_SECONDS 1200 Token lifetime in seconds.
RESET_MAX_ATTEMPTS 5 Max code attempts before invalidation.
RESET_RESEND_COOLDOWN_SECONDS 60 Cooldown between resend requests.
RESET_CODE_PEPPER (from Secrets Manager) Server-side pepper for reset codes.

OAuth (optional)

Variable Default Description
GOOGLE_CLIENT_ID unset Google OAuth client ID.
GOOGLE_CLIENT_SECRET unset Google OAuth client secret.
MICROSOFT_CLIENT_ID unset Azure AD app registration client ID.
MICROSOFT_CLIENT_SECRET unset Azure AD app registration client secret.
MICROSOFT_TENANT_ID common common, organizations, consumers, or specific tenant ID.

OAuth is optional. If unset, only username/password authentication is available.

SAML SSO (optional)

Variable Default Description
SAML_ENABLED false Enable SAML 2.0 SSO.
SAML_IDP_METADATA_URL unset URL to fetch IdP metadata XML.
SAML_SP_ENTITY_ID (mirrors APP_BASE_URL) Service provider entity ID.
SAML_SP_ACS_URL ${APP_BASE_URL}/auth/saml/acs Assertion consumer service URL.
SAML_SP_CERT_PATH /etc/complitru/saml/sp.crt Path to SP signing certificate.
SAML_SP_KEY_PATH /etc/complitru/saml/sp.key Path to SP signing private key.

Email

Variable Default Description
EMAIL_PROVIDER smtp smtp, ses, disabled
SMTP_HOST unset SMTP server hostname.
SMTP_PORT 587 SMTP port.
SMTP_USER unset SMTP username.
SMTP_PASS (from Secrets Manager) SMTP password.
SMTP_FROM unset Default From: address.
SMTP_STARTTLS true Use STARTTLS.
SES_FROM unset Required if EMAIL_PROVIDER=ses. SES sender identity.
SES_REGION (mirrors AWS_REGION) SES region.

Rate limiting

Variable Default Description
RATE_LIMITS_IP 10/hour Auth endpoints per IP.
RATE_LIMITS_EMAIL 5/hour Per email address.
RATE_LIMITS_API 1000/hour Per API key, applied to data endpoints.
RATE_LIMIT_STORAGE_URI redis://... Where to store counters (use Redis in production).

Logging

Variable Default Description
LOG_LEVEL INFO DEBUG, INFO, WARNING, ERROR, CRITICAL
LOG_FORMAT json json for CloudWatch parsing, text for human-readable
LOG_INCLUDE_REQUEST_ID true Adds correlation ID to every log line
LOG_INCLUDE_USER_ID true Adds authenticated user ID to log entries
AUDIT_LOG_TO_CLOUDWATCH true Mirror audit log entries to dedicated CloudWatch log group
AUDIT_LOG_TO_S3 true Mirror audit log entries to Object-Locked S3 bucket

License

Variable Default Description
COMPLITRU_LICENSE_KEY (from Secrets Manager) Your CompliTru license.
LICENSE_SERVER_URL https://license.complitru.ai License validation endpoint.
LICENSE_CHECK_INTERVAL_SECONDS 3600 Refresh frequency.
LICENSE_OFFLINE_GRACE_DAYS 7 Days the app continues working if license server unreachable.
LICENSE_FAIL_CLOSED true If true, refuse scans when license invalid; if false, log warning only.

Cloud scanning

Variable Default Description
SCAN_DEFAULT_REGIONS us-east-1,us-east-2,us-west-2,us-west-1,eu-west-1 Default regions to scan when not specified per-account.
SCAN_MAX_PARALLEL_ACCOUNTS 10 Cap on parallel account scans (controls IAM rate limit pressure).
SCAN_API_RETRY_MAX 5 Max retries on AWS API throttling.
SCAN_API_RETRY_BACKOFF exponential exponential, linear, constant
SCAN_TIMEOUT_MINUTES 60 Hard timeout per single account scan.

Storage

Variable Default Description
S3_REPORT_BUCKET (from Terraform) Bucket for generated reports.
S3_AUDIT_BUCKET (from Terraform) Object-Locked bucket for audit logs.
S3_REPORT_RETENTION_DAYS 365 Lifecycle delete after N days.
S3_AUDIT_RETENTION_YEARS 7 Object Lock retention period.

Feature flags

Variable Default Description
FEATURE_AUTO_REMEDIATION false Allow remediation without per-finding approval. Default off for safety.
FEATURE_NL_QUERY true Natural language querying via AI.
FEATURE_CIEM true Identity and access analysis module.
FEATURE_COST_OPTIMIZATION true Right-sizing recommendations.
FEATURE_AI_GOVERNANCE false Beta — AI governance and policy module.
FEATURE_FREE_SCAN_PUBLIC false Public free-scan landing page (only relevant for hosted SaaS).

Terraform variables

Set in terraform/terraform.tfvars. Drives the values above.

Networking

Variable Default Description
vpc_cidr 10.0.0.0/16 VPC CIDR block.
availability_zones ["us-east-1a", "us-east-1b"] AZs for multi-AZ deployment.
enable_nat_ha false Single NAT (cost) vs. one NAT per AZ (HA).
enable_vpc_endpoints true Create endpoints for S3, Secrets Manager, KMS, ECR, CloudWatch Logs.
enable_flow_logs true VPC flow logs to CloudWatch.

Compute (ECS)

Variable Default Description
backend_cpu 1024 Fargate CPU units (1024 = 1 vCPU).
backend_memory 2048 Fargate memory in MB.
backend_desired_count 2 Min running tasks.
backend_max_count 10 Auto-scaling ceiling.
frontend_cpu 512
frontend_memory 1024
frontend_desired_count 2
frontend_max_count 10
worker_cpu 1024
worker_memory 2048
worker_desired_count 1
worker_max_count 5
scaling_target_cpu_percent 70 Target tracking auto-scale threshold.

Database

Variable Default Description
db_instance_class db.t3.medium RDS instance class.
db_allocated_storage_gb 100 Initial storage.
db_max_storage_gb 1000 Storage autoscaling ceiling.
db_multi_az true Multi-AZ for production.
db_backup_retention_days 30 Automated backup retention.
db_deletion_protection true Prevent accidental deletion.
db_parameter_group_family mysql8.0 RDS parameter group family.

Redis (ElastiCache)

Variable Default Description
redis_node_type cache.t3.micro Instance class.
redis_num_cache_nodes 2 Node count (replication enabled).
redis_at_rest_encryption true KMS encryption.
redis_in_transit_encryption true TLS.

Load balancer

Variable Default Description
domain_name (required) Your custom domain (e.g., complitru.your-corp.com).
acm_certificate_arn (required) ARN of the ACM cert for domain_name.
enable_waf false Attach AWS WAF with managed OWASP rule sets.
alb_idle_timeout_seconds 60 ALB idle timeout.

AI

Variable Default Description
ai_enabled true Master AI toggle. Maps to AI_ENABLED.
ai_provider bedrock bedrock or openai. Maps to AI_PROVIDER.
ai_region (mirrors aws_region) Region for Bedrock.

Tagging

Variable Default Description
tags {} Map of tags applied to all resources. Recommended: Owner, CostCenter, Environment.
environment production Used in resource names and Secrets Manager paths.

Configuration validation

On startup, the backend validates configuration and refuses to start if:

Validation failures are logged to stdout (and CloudWatch via the ECS log driver) before the container exits.

Configuration audit

To dump the effective configuration (with secrets redacted):

# Inside an ECS exec session:
python3 -c "from app.config import dump_effective_config; dump_effective_config()"

Outputs a JSON tree of every effective configuration value with secrets shown as ***REDACTED***. Useful for verifying what configuration was actually loaded after a deploy.

Section 06

Deployment Runbook

CompliTru Self-Hosted — Deployment Runbook

End-to-end deployment of CompliTru into your AWS account using the included Terraform package. First-time deployment takes ~45 minutes.

Prerequisites

Requirement Version Notes
AWS account Admin or equivalent permissions for initial deployment
AWS CLI >= 2.x Configured with credentials (aws sts get-caller-identity returns your account)
Terraform >= 1.5 terraform.io/downloads
Domain name A subdomain you can point at the ALB (e.g., complitru.your-corp.com)
ACM certificate Valid certificate for the chosen domain in the deployment region
CompliTru license key Provided by CompliTru in ctl_... format
Container registry access Either pull from ghcr.io/complitru/* (default) or mirror to your own ECR

IAM permissions needed for deployment

The IAM principal running terraform apply needs permissions to create resources in:

The simplest grant is the AWS-managed AdministratorAccess policy for the deployment role, used only for the initial Terraform apply. Subsequent updates can run with a more scoped policy.

Container images

Images are published to GitHub Container Registry by default:

ghcr.io/complitru/backend:1.0.0
ghcr.io/complitru/frontend:1.0.0
ghcr.io/complitru/worker:1.0.0

For air-gapped or compliance reasons, you can mirror these to your private ECR. See Mirroring images to ECR below.

Image digests for v1.0.0

Pin to digests in your task definitions for reproducible deployments:

ghcr.io/complitru/backend:1.0.0@sha256:<digest>
ghcr.io/complitru/frontend:1.0.0@sha256:<digest>
ghcr.io/complitru/worker:1.0.0@sha256:<digest>

(Replace <digest> with the digest from your release email.)

Deployment steps

Step 1 — Validate prerequisites

# Verify AWS access
aws sts get-caller-identity

# Verify Terraform version
terraform version  # >= 1.5

# Verify domain DNS is in your control
dig +short complitru.your-corp.com

Step 2 — Provision an ACM certificate (one-time)

If you don't already have an ACM certificate for your domain:

aws acm request-certificate \
    --domain-name "complitru.your-corp.com" \
    --validation-method DNS \
    --region us-east-1

Add the DNS validation record to your Route 53 / external DNS provider. Wait for ISSUED status:

aws acm describe-certificate --certificate-arn <arn> --query 'Certificate.Status'

Note the certificate ARN — you'll provide it as acm_certificate_arn in Terraform.

Step 3 — Enable Bedrock model access (if ai_provider = "bedrock")

In the AWS Console:

  1. Open Bedrock Console → your deployment region
  2. Model accessManage model access
  3. Request access to: - Anthropic Claude Sonnet 4 - Anthropic Claude 3.5 Haiku
  4. Approval is automatic in most regions (under 5 minutes)

Skip if ai_provider = "openai" or ai_enabled = false.

Step 4 — Configure Terraform variables

cd schellman/terraform
cp example.tfvars terraform.tfvars

Edit terraform.tfvars with your values:

# Required
aws_region              = "us-east-1"
environment             = "production"
domain_name             = "complitru.your-corp.com"
acm_certificate_arn     = "arn:aws:acm:us-east-1:123456789012:certificate/abc..."
complitru_license_key   = "ctl_..."

# AI configuration (see AI_PROVIDER.md)
ai_enabled              = true
ai_provider             = "bedrock"

# Networking
vpc_cidr                = "10.42.0.0/16"  # Adjust to avoid CIDR conflicts
availability_zones      = ["us-east-1a", "us-east-1b"]

# Compute sizing — see ARCHITECTURE.md "Production defaults" for guidance
backend_cpu             = 1024
backend_memory          = 2048
backend_desired_count   = 2

# Database
db_instance_class       = "db.t3.medium"
db_multi_az             = true

# Tagging
tags = {
  Owner       = "platform-team"
  CostCenter  = "engineering"
  Environment = "production"
  Application = "complitru"
}

Full variable reference: CONFIGURATION.md.

Step 5 — Initialize Terraform

terraform init

Recommended: configure a remote backend (S3 + DynamoDB lock table) for state management. See terraform/backend.tf.example.

Step 6 — Review the plan

terraform plan -out=tfplan

Carefully review:

Step 7 — Apply

terraform apply tfplan

Apply takes 30–45 minutes. Most of the time is spent on:

Outputs at the end:

alb_dns_name        = "complitru-alb-12345.us-east-1.elb.amazonaws.com"
alb_zone_id         = "Z35SXDOTRQ7X7K"
db_endpoint         = "complitru-prod.cluster-abc.us-east-1.rds.amazonaws.com:3306"
ecs_cluster_name    = "complitru-prod"
secret_arn          = "arn:aws:secretsmanager:us-east-1:123456789012:secret:complitru/production/app-AbCdEf"
s3_report_bucket    = "complitru-reports-123456789012-us-east-1"
s3_audit_bucket     = "complitru-audit-123456789012-us-east-1"

Step 8 — Point DNS at the ALB

Create a Route 53 alias record (or CNAME for non-Route 53 DNS):

# Example using Route 53 CLI:
aws route53 change-resource-record-sets \
    --hosted-zone-id <YOUR-ZONE-ID> \
    --change-batch '{
      "Changes": [{
        "Action": "UPSERT",
        "ResourceRecordSet": {
          "Name": "complitru.your-corp.com",
          "Type": "A",
          "AliasTarget": {
            "HostedZoneId": "Z35SXDOTRQ7X7K",
            "DNSName": "complitru-alb-12345.us-east-1.elb.amazonaws.com",
            "EvaluateTargetHealth": true
          }
        }
      }]
    }'

Wait for DNS propagation (typically under 60 seconds).

Step 9 — Run database migrations

cd ../scripts
./migrate.sh

This invokes aws ecs run-task with the migration task definition, which:

  1. Connects to RDS using credentials from Secrets Manager
  2. Runs all pending Alembic migrations idempotently
  3. Seeds the initial admin user (using the email from terraform.tfvars)
  4. Loads default check definitions, framework mappings, and permissions

Output:

[migrate.sh] Running migrations against complitru-prod...
[migrate.sh] Applied 47 migrations
[migrate.sh] Seeded admin user: admin@your-corp.com
[migrate.sh] Loaded 612 checks across 8 frameworks
[migrate.sh] Migration complete

The admin user receives a one-time login email at the address you specified.

Step 10 — Verify deployment

./verify-deployment.sh

Runs end-to-end smoke tests:

Output:

✓ ALB health check                       (200 OK in 45ms)
✓ Backend → RDS connectivity             (query in 12ms)
✓ Backend → Redis connectivity           (PING in 3ms)
✓ Celery worker online                   (task processed in 280ms)
✓ License validation                     (status: active, expires 2027-04-15)
✓ AI provider (bedrock)                  (test prompt completed in 1.2s, 14 tokens)
✓ S3 report bucket writeable             (test object PUT succeeded)
✓ CloudWatch log delivery                (recent log entries present)

All checks passed. Deployment is healthy.

Step 11 — First login

  1. Navigate to https://complitru.your-corp.com in your browser
  2. Click Forgot password? (your admin email was seeded but no password set)
  3. Check your inbox for the password reset email
  4. Set a strong password
  5. Enable MFA (TOTP — Google Authenticator, 1Password, etc.) — required for the admin role by default
  6. Connect your first AWS account: Settings → AWS Accounts → Connect Account

Connecting AWS accounts to scan

The simplest pattern is a cross-account IAM role:

  1. In CompliTru: Settings → AWS Accounts → Connect Account
  2. Copy the trust policy and role template shown
  3. In the target AWS account, create the role using the provided CloudFormation template
  4. Copy the role ARN back into CompliTru
  5. CompliTru runs an immediate validation scan to confirm permissions

The cross-account role uses an external ID for security and grants read-only permissions plus targeted write permissions for remediation actions. Full IAM policy provided in the CloudFormation template at terraform/templates/customer-account-role.yaml.

Mirroring images to ECR

For air-gapped or compliance-driven deployments, mirror CompliTru images to your own ECR:

# Authenticate to source registry
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin

# Authenticate to destination ECR
aws ecr get-login-password --region us-east-1 \
    | docker login --username AWS --password-stdin \
        123456789012.dkr.ecr.us-east-1.amazonaws.com

# Create ECR repositories
for component in backend frontend worker; do
    aws ecr create-repository \
        --repository-name complitru/$component \
        --image-scanning-configuration scanOnPush=true \
        --image-tag-mutability IMMUTABLE
done

# Mirror images
for component in backend frontend worker; do
    docker pull ghcr.io/complitru/$component:1.0.0
    docker tag ghcr.io/complitru/$component:1.0.0 \
        123456789012.dkr.ecr.us-east-1.amazonaws.com/complitru/$component:1.0.0
    docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/complitru/$component:1.0.0
done

Then in terraform.tfvars:

container_registry = "123456789012.dkr.ecr.us-east-1.amazonaws.com"

Apply Terraform to update the task definitions to pull from your ECR.

Multi-environment deployment

The Terraform module is environment-agnostic. To deploy staging alongside production:

# Use a separate Terraform workspace
cd terraform
terraform workspace new staging
cp terraform.tfvars terraform.staging.tfvars
# Edit terraform.staging.tfvars: change `environment`, `domain_name`, `vpc_cidr`, sizing
terraform apply -var-file=terraform.staging.tfvars

Resources are namespaced by environment so staging and production coexist in the same AWS account without conflict.

Rotating secrets

Rotate SECRET_KEY (Flask session signing)

Impact: All active user sessions invalidated; users re-login.

NEW_KEY=$(python3 -c "import secrets; print(secrets.token_hex(32))")

aws secretsmanager update-secret \
    --secret-id complitru/production/app \
    --secret-string "$(aws secretsmanager get-secret-value \
        --secret-id complitru/production/app \
        --query SecretString --output text \
        | jq --arg k "$NEW_KEY" '.SECRET_KEY = $k')"

# Force backend tasks to pick up new value
aws ecs update-service \
    --cluster complitru-production \
    --service complitru-backend \
    --force-new-deployment

Rotate ENCRYPTION_KEY (Fernet field encryption)

Impact: Existing encrypted DB fields (API keys, integration credentials) become unreadable. Requires a re-encryption migration.

Run the included rotation script:

./scripts/rotate-encryption-key.sh

Which generates a new key, re-encrypts all affected DB fields with the new key, atomically swaps the Secrets Manager value, and forces a service restart.

Rotate database password

Use native Secrets Manager rotation (configured by Terraform):

aws secretsmanager rotate-secret --secret-id complitru/production/db

The rotation Lambda swaps credentials with zero downtime — RDS supports two simultaneous credential pairs during rotation.

Rotate COMPLITRU_LICENSE_KEY

When CompliTru issues a new license at renewal:

aws secretsmanager update-secret \
    --secret-id complitru/production/app \
    --secret-string "$(aws secretsmanager get-secret-value \
        --secret-id complitru/production/app \
        --query SecretString --output text \
        | jq --arg k "$NEW_LICENSE_KEY" '.COMPLITRU_LICENSE_KEY = $k')"

aws ecs update-service \
    --cluster complitru-production \
    --service complitru-backend \
    --force-new-deployment

Backup verification

RDS backup verification (monthly recommended)

# List automated snapshots
aws rds describe-db-snapshots \
    --db-instance-identifier complitru-production \
    --snapshot-type automated \
    --query 'DBSnapshots[*].[DBSnapshotIdentifier,SnapshotCreateTime,Status]'

# Restore the most recent snapshot to a verification instance
aws rds restore-db-instance-from-db-snapshot \
    --db-instance-identifier complitru-verify-$(date +%Y%m%d) \
    --db-snapshot-identifier <most-recent-snapshot-id> \
    --db-instance-class db.t3.small \
    --no-multi-az

# After verification, delete the verification instance
aws rds delete-db-instance \
    --db-instance-identifier complitru-verify-$(date +%Y%m%d) \
    --skip-final-snapshot

S3 audit bucket verification

# List recent audit log entries (Object Lock prevents deletion)
aws s3 ls s3://complitru-audit-${ACCOUNT_ID}-${REGION}/audit/$(date +%Y/%m)/

# Verify Object Lock retention is in place
aws s3api get-object-retention \
    --bucket complitru-audit-${ACCOUNT_ID}-${REGION} \
    --key audit/$(date +%Y/%m/%d)/sample.json

Incident response

When CloudWatch alarms fire (high error rate, high latency, license expiration approaching):

  1. Alarm receipt → SNS → your on-call channel (PagerDuty / Slack / email)
  2. Initial triage — check CloudWatch dashboards in the complitru-production group: - ECS service CPU/memory - ALB 4xx / 5xx rates - RDS connections, CPU, IOPS - Celery queue depth
  3. Common containment actions: - Scale ECS service up: aws ecs update-service --service complitru-backend --desired-count 6 - Block bad actor IP at WAF: add IP to deny list - Roll back to previous task definition: aws ecs update-service --task-definition complitru-backend:N-1 - Pause Celery tasks: aws ecs update-service --service complitru-worker --desired-count 0
  4. Investigation: - CloudWatch Logs Insights query: fields @timestamp, @message | filter @logStream like /backend/ | sort @timestamp desc | limit 100 - Application audit log: query audit_log table in RDS for affected user / time window - VPC flow logs for network anomalies
  5. Resolution: roll forward with fix, document in post-mortem

For coordinated disclosure of CompliTru product vulnerabilities, email security@complitru.ai. Initial response within 48 hours.

Decommissioning

To fully remove a CompliTru deployment:

cd terraform

# 1. Disable deletion protection on RDS first
terraform apply -var="db_deletion_protection=false"

# 2. Empty S3 buckets (Terraform won't delete non-empty buckets)
aws s3 rm s3://complitru-reports-${ACCOUNT_ID}-${REGION} --recursive
# Note: audit bucket cannot be emptied due to Object Lock — handle per your retention policy

# 3. Destroy infrastructure
terraform destroy

S3 audit bucket may need to remain in place until Object Lock retention period expires (default 7 years). The bucket can be left as a standalone resource after Terraform destroy of everything else.

Support

Reference your license key ID in all support correspondence for fastest routing.

Section 07

Operations Guide

CompliTru Self-Hosted — Operations Guide

Day-2 operations: monitoring, troubleshooting, scaling, common runbooks. Targets the platform engineer who keeps the deployment healthy.

Health checks

Endpoint Returns Use for
GET /health (backend) 200 {"status": "ok"} if Flask + RDS + Redis reachable ALB target group health check
GET /api/health (frontend) 200 {"status": "ok"} Frontend health
GET /api/health/deep (backend) Detailed: license status, queue depth, AI provider status Manual diagnosis
Celery worker celery -A app.celery inspect ping (run via aws ecs execute-command) Worker liveness

ALB target group health check configuration:

Path:                /health
Healthy threshold:   2
Unhealthy threshold: 3
Timeout:             10 seconds
Interval:            30 seconds
Success codes:       200

Logs

Where logs live

Source CloudWatch Log Group Notes
Backend (Flask + Gunicorn) /aws/ecs/complitru-backend All HTTP requests, app logs, errors
Frontend (Next.js) /aws/ecs/complitru-frontend SSR errors, build warnings
Worker (Celery) /aws/ecs/complitru-worker Task execution logs, scan progress
ALB access logs S3 bucket complitru-alb-logs-* (if enabled) Per-request access logs
VPC flow logs /aws/vpc/flowlogs/complitru Network traffic, security investigation
RDS slow query log /aws/rds/instance/complitru-production/slowquery DB performance debugging
RDS error log /aws/rds/instance/complitru-production/error DB errors
Audit log /aws/ecs/complitru-audit + S3 audit bucket Immutable copy in S3 with Object Lock

Log format

JSON-structured logs by default (LOG_FORMAT=json):

{
  "timestamp": "2026-04-15T14:23:45.123Z",
  "level": "INFO",
  "logger": "app.scan_engine",
  "message": "Scan completed",
  "request_id": "req_a1b2c3d4",
  "user_id": 42,
  "account_id": "aws-account-123",
  "duration_ms": 18420,
  "findings_count": 47
}

Common queries (CloudWatch Logs Insights)

All errors in the last hour:

fields @timestamp, @message
| filter level = "ERROR"
| sort @timestamp desc
| limit 100

Slowest 20 requests:

fields @timestamp, request_id, path, duration_ms
| filter ispresent(duration_ms)
| sort duration_ms desc
| limit 20

All actions by a specific user:

fields @timestamp, action, target, source_ip
| filter user_id = 42
| sort @timestamp desc
| limit 200

Bedrock errors:

fields @timestamp, @message
| filter @message like /bedrock/i
| filter level in ["WARNING", "ERROR"]
| sort @timestamp desc
| limit 100

Celery task failures:

fields @timestamp, task_name, exception
| filter @logStream like /worker/
| filter @message like /Task .* raised/
| sort @timestamp desc
| limit 50

Metrics

Pre-configured CloudWatch alarms

Created by Terraform; published to SNS topic complitru-alerts (route to PagerDuty / Slack / email).

Alarm Threshold Severity
complitru-backend-cpu-high Avg CPU > 85% for 10 min Warning
complitru-backend-memory-high Avg memory > 85% for 10 min Warning
complitru-backend-task-count-low Healthy task count < 1 for 5 min Critical
complitru-worker-task-count-low Healthy task count < 1 for 5 min Critical
complitru-alb-5xx-rate 5xx rate > 1% over 5 min Warning
complitru-alb-target-response-time p95 response time > 5s for 10 min Warning
complitru-rds-cpu-high Avg CPU > 80% for 15 min Warning
complitru-rds-storage-low Free storage < 20% Warning
complitru-rds-connections-high Connections > 80% of max Warning
complitru-redis-memory-high Memory usage > 85% Warning
complitru-celery-queue-depth Queue depth > 1000 for 15 min Warning
complitru-license-expiry-warning License expires in < 7 days Critical

Key dashboards

Custom CloudWatch dashboards created by Terraform:

Access at AWS Console → CloudWatch → Dashboards → complitru-*.

Application-level metrics

Backend exports custom metrics to CloudWatch under namespace CompliTru/Application:

Metric Unit Description
ScansStarted Count Scans initiated per period
ScansCompleted Count Scans completed successfully
ScansFailed Count Scans failed
ScanDuration Milliseconds Scan completion time
FindingsCreated Count New findings per period
RemediationsApplied Count Remediations executed
AICompletions Count LLM calls per provider
AITokensUsed Count Total AI tokens consumed
LicenseValidationFailures Count License check failures

Scaling

Horizontal scaling

ECS auto-scaling is target-tracking on CPU. Adjust thresholds in Terraform:

backend_desired_count        = 2
backend_max_count            = 10
scaling_target_cpu_percent   = 70

worker_desired_count         = 1
worker_max_count             = 5
worker_scaling_queue_depth   = 100   # scale workers when queue > N tasks

Apply with terraform apply. New tasks come online in ~60 seconds.

Vertical scaling

To increase task size:

backend_cpu    = 2048   # 2 vCPU
backend_memory = 4096   # 4 GB

terraform apply triggers a rolling restart. Connection draining keeps existing requests served.

RDS scaling

Storage scales automatically up to db_max_storage_gb. To resize the instance class:

db_instance_class = "db.r5.large"   # was db.t3.medium

terraform apply triggers an RDS modification. With Multi-AZ, this is a zero-downtime modification (failover to standby, modify, failover back). With single-AZ, expect ~5 minutes of downtime.

Common runbooks

Service is down — backend not responding

  1. Check ECS service health: bash aws ecs describe-services \ --cluster complitru-production \ --services complitru-backend \ --query 'services[0].{desired:desiredCount,running:runningCount,pending:pendingCount}'

  2. Check recent task failures: bash aws ecs list-tasks \ --cluster complitru-production \ --service-name complitru-backend \ --desired-status STOPPED \ --max-items 5

  3. Inspect why tasks stopped: bash aws ecs describe-tasks \ --cluster complitru-production \ --tasks <task-arn> \ --query 'tasks[0].{stoppedReason,stopCode,exitCode:containers[0].exitCode}'

  4. Common stopped reasons and remediation:

Reason Remediation
Essential container exited + exit code 1 Check CloudWatch logs for stack trace; rollback to previous task definition if recent deploy
Out of memory Increase backend_memory in Terraform
Health check failed Backend returning non-200; check /health deep dive endpoint
Task placement failed Subnet running out of IPs; expand subnet CIDR or add subnet
Image pull failure ECR auth issue or image deleted; verify image exists and ECS task role has pull permissions

Scans are queuing, not running

  1. Check worker count: bash aws ecs describe-services \ --cluster complitru-production \ --services complitru-worker \ --query 'services[0].runningCount'

  2. Check queue depth: ```bash # SSH into a backend task via ECS exec aws ecs execute-command \ --cluster complitru-production \ --task \ --container backend \ --interactive \ --command "/bin/bash"

# Inside the task: python3 -c " from app.celery import celery inspect = celery.control.inspect() print('Active:', inspect.active()) print('Reserved:', inspect.reserved()) print('Scheduled:', inspect.scheduled()) " ```

  1. Scale workers up: bash aws ecs update-service \ --cluster complitru-production \ --service complitru-worker \ --desired-count 4

  2. If workers running but tasks not picked up — Redis connectivity issue. Check Redis SG and TLS config.

License validation failing

Symptom: backend logs License validation failed repeatedly.

  1. Check license key in Secrets Manager: bash aws secretsmanager get-secret-value \ --secret-id complitru/production/app \ --query 'SecretString' --output text \ | jq '.COMPLITRU_LICENSE_KEY' | head -c 20 (Just verify it starts with ctl_ — full key is sensitive.)

  2. Check license server reachability from a backend task: bash # Via ECS exec curl -v https://license.complitru.ai/health

  3. Check VPC egress — license check requires outbound HTTPS via NAT gateway. If NAT is broken, license validation fails.

  4. Check license expiry — every license has an expiration. Contact sales@complitru.ai to renew. The 7-day offline grace period buys time, but the application will fail closed after grace expires.

  5. If license is valid but check still fails — potential clock skew. Verify task time is correct (NTP via ECS task agent should handle this automatically).

High RDS CPU

  1. Identify slow queries: sql SELECT * FROM mysql.slow_log ORDER BY query_time DESC LIMIT 20;

  2. Check connection count: sql SHOW STATUS LIKE 'Threads_connected'; If approaching max_connections, increase the parameter group value or scale instance class.

  3. Common culprits: - Unindexed query on findings table — verify indexes exist on common filter columns (account_id, severity, status) - Audit log query without time bound — always filter audit log queries by created_at - Long-running scan transaction — check processlist for queries > 60s

  4. Emergency mitigation: kill long-running queries with KILL <thread_id> (admin only).

AI calls failing

Most common: Bedrock model access not enabled or region mismatch.

  1. Check the actual error in CloudWatch logs: fields @timestamp, @message | filter @message like /bedrock/i | filter level = "ERROR" | sort @timestamp desc | limit 20

  2. Common errors and fixes:

Error Fix
AccessDeniedException Enable model access in Bedrock console for the region
Model not found Verify BEDROCK_MODEL_DEFAULT matches an enabled model in your region
ThrottlingException Bedrock per-region quota hit; request quota increase or distribute load across regions
Timeout Increase WORKER_TIMEOUT for long-running AI calls; check VPC endpoint configuration if using one
  1. Fallback — set AI_ENABLED=false temporarily to keep the product operational while diagnosing.

Frontend showing 502

  1. Verify backend is healthy (run backend health check above)
  2. Check ALB target group: bash aws elbv2 describe-target-health \ --target-group-arn <backend-tg-arn> \ --query 'TargetHealthDescriptions[*].{id:Target.Id,health:TargetHealth.State,reason:TargetHealth.Reason}'
  3. Common fixes: - If targets are unhealthy with reason Target.Timeout — task is overloaded, scale up - If targets are unused — security group not allowing ALB → task port; check Terraform SG - If empty target list — service has 0 running tasks; investigate why

Deployments failing CI

CompliTru-published images are immutable and signed. To verify an image before deploy:

# Verify signature
cosign verify ghcr.io/complitru/backend:1.0.0 \
    --certificate-identity-regexp '.*@complitru\.ai$' \
    --certificate-oidc-issuer 'https://token.actions.githubusercontent.com'

# Verify SBOM
syft ghcr.io/complitru/backend:1.0.0 -o spdx-json > sbom.json
diff sbom.json official-sbom-1.0.0.json

If verification fails, do not deploy. Contact security@complitru.ai.

Routine maintenance

Weekly

Monthly

Quarterly

Annually

Performance tuning

Backend response times

Target: p95 < 500ms for read endpoints, < 2s for scan triggers.

If exceeding:

  1. Check RDS slow query log — likely a missing index or unoptimized query
  2. Check Gunicorn worker countWORKERS=4 is conservative; bump to 2 * vCPU + 1
  3. Check task CPU — if maxing out CPU under load, scale up vertically or horizontally
  4. Check Redis latency — should be < 1ms; if higher, investigate ElastiCache CPU and connection count

Scan throughput

Target: 1000-resource AWS account scanned in under 5 minutes.

If slower:

  1. Increase SCAN_MAX_PARALLEL_ACCOUNTS — defaults to 10 concurrent accounts
  2. Increase CELERY_WORKER_CONCURRENCY — more concurrent task processing per worker
  3. Scale workers horizontallyworker_desired_count and worker_max_count
  4. Check AWS API throttling — if hitting rate limits, distribute scans across IAM roles

AI latency

If slower than typical:

  1. Check BEDROCK_REGION — should match deployment region (cross-region adds 50–200ms)
  2. Check VPC endpoint config — Bedrock VPC endpoint reduces latency vs. NAT egress
  3. Use streaming — for long responses, switch from chat_completion to stream_completion for perceived faster response

Cost optimization

Typical monthly cost for a default deployment:

Component Cost
ECS Fargate (backend, frontend, worker) ~$140
RDS db.t3.medium Multi-AZ ~$140
ElastiCache t3.micro ~$25
ALB ~$25
NAT Gateway (single) ~$30
CloudWatch (logs + metrics + alarms) ~$40
S3 ~$10
Secrets Manager ~$5
Bedrock (depends on usage) ~$30–200
Total ~$445–615/mo

Cost reduction options:

Support escalation

Issue type Contact Response SLA
Operational issue with the deployment support@complitru.ai 4 business hours
Security vulnerability in the product security@complitru.ai 48 hours initial response
License renewal / contract sales@complitru.ai 1 business day
Critical production outage support@complitru.ai + phone (in your support contract) 1 hour

Always include in support requests: - Your license key ID (first 12 chars only) - Deployment region and version - Relevant log snippets (CloudWatch query results acceptable) - Steps to reproduce - Impact statement

Section 08

Upgrade Guide

CompliTru Self-Hosted — Upgrade Guide

How to upgrade your CompliTru deployment to a new version safely with zero data loss and minimal downtime.

Versioning

CompliTru uses semantic versioning: MAJOR.MINOR.PATCH.

Bump type Compatibility Action required
Patch (e.g., 1.0.0 → 1.0.1) Fully backward-compatible — bug fixes only Image swap, no migrations
Minor (e.g., 1.0.x → 1.1.0) Backward-compatible — new features, additive schema changes Image swap + migrations (online)
Major (e.g., 1.x.x → 2.0.0) Breaking changes possible — schema, configuration, or API Image swap + migrations + configuration review

Release cadence

Release type Frequency Notification
Critical security patch Within 72 hours of CVE disclosure Email to license-tied address
Patch Bi-weekly Email + release notes
Minor Quarterly Email + release notes + 30-day preview
Major Annually Email + release notes + 90-day preview + migration guide

Release notes for every version: https://complitru.ai/releases (license required).

Pre-upgrade checklist

Before any upgrade:

Patch upgrade (e.g., 1.0.0 → 1.0.1)

Zero downtime. Rolling restart.

# 1. Update Terraform variable
cd terraform
sed -i.bak 's/complitru_version = "1.0.0"/complitru_version = "1.0.1"/' terraform.tfvars

# 2. Plan and apply
terraform plan
terraform apply

# 3. Monitor deployment
aws ecs wait services-stable \
    --cluster complitru-production \
    --services complitru-backend complitru-frontend complitru-worker

# 4. Verify
cd ../scripts
./verify-deployment.sh

ECS performs a rolling deploy: new tasks come up healthy, then old tasks drain. Connection draining handles in-flight requests. Total time: 5–10 minutes.

Patch rollback

# Revert version
sed -i.bak 's/complitru_version = "1.0.1"/complitru_version = "1.0.0"/' terraform.tfvars
terraform apply

ECS rolls back to the previous task definition revision. Patches are guaranteed schema-compatible — no data migration required for rollback.

Minor upgrade (e.g., 1.0.x → 1.1.0)

Online schema migrations. Brief feature flag rollout possible.

Step 1 — Read release notes

Minor releases may include: - New configuration variables (with safe defaults) - New database tables / columns (additive only) - Deprecated features (with at least one minor cycle of warning) - New optional features behind feature flags

Step 2 — Update Terraform

cd terraform
# Update version
sed -i.bak 's/complitru_version = "1.0.5"/complitru_version = "1.1.0"/' terraform.tfvars

# Add any new variables called out in release notes
# Example: a new feature flag introduced in 1.1.0
echo 'feature_xyz_enabled = false   # New in 1.1.0, default off' >> terraform.tfvars

Step 3 — Run migrations

Migrations are run as a one-off ECS task with the new image, before promoting the service:

cd ../scripts
./migrate.sh --version=1.1.0 --dry-run    # Preview migrations
./migrate.sh --version=1.1.0              # Apply

Migration script: - Uses Alembic with idempotent operations - Applies in a single transaction per migration (auto-rollback on error) - Logs every step to CloudWatch - Refuses to run if any migration would cause more than 30 seconds of table lock (manual approval required for those)

For long-running migrations (large tables, index rebuilds), the script supports online migration via pt-online-schema-change patterns. Documented per-migration in release notes.

Step 4 — Deploy new application image

cd ../terraform
terraform apply

Same rolling deploy pattern as patch upgrades. Total time: 10–20 minutes including migration.

Step 5 — Verify

cd ../scripts
./verify-deployment.sh

Plus check the CompliTru UI: - Login works - Dashboards render - A scan can be initiated and completes - New features in release notes are present (if any)

Minor rollback

If issues are detected post-upgrade:

# Revert application version
sed -i.bak 's/complitru_version = "1.1.0"/complitru_version = "1.0.5"/' terraform.tfvars
terraform apply

Application code rolls back. Schema is forward-compatible (additive changes), so old code continues working with new schema. Do not attempt to roll back schema migrations — additive changes are safe to leave; reverting them risks data loss.

If rollback fails or behavior is broken: - Restore RDS from the pre-upgrade snapshot taken in pre-upgrade checklist - Redeploy old version against restored DB - Time: ~30 minutes - Data loss window: between pre-upgrade snapshot and incident

Major upgrade (e.g., 1.x.x → 2.0.0)

Major upgrades may include breaking changes. Plan a maintenance window.

Step 1 — Review the migration guide

Every major release ships with a dedicated migration guide at https://complitru.ai/releases/2.0.0/migration. Read it end-to-end.

Key sections to expect: - Configuration changes (renamed / removed variables) - Schema changes (any non-additive changes called out explicitly) - API changes (deprecated endpoints, breaking changes) - IAM permission changes (any new permissions required) - Behavioral changes (defaults that changed)

Step 2 — Test in staging first

Spin up a staging deployment from a recent production backup:

cd terraform
terraform workspace new staging-2x-test
cp terraform.tfvars terraform.staging.tfvars
# Edit: set environment, domain, smaller sizing
terraform apply -var-file=terraform.staging.tfvars

# Restore production snapshot to staging RDS
aws rds restore-db-instance-from-db-snapshot \
    --db-instance-identifier complitru-staging \
    --db-snapshot-identifier <recent-prod-snapshot>

# Run migration against staging
./scripts/migrate.sh --version=2.0.0 --target=staging

Run your test suite, manual QA, and integration tests in staging before touching production.

Step 3 — Schedule production maintenance window

Notify users in advance via: - In-app banner (Settings → Maintenance Mode → Schedule) - Email to admin distribution list - Status page update (if you publish one)

Recommended window: 2 hours during low-traffic period for first major upgrade. Subsequent ones can be shorter once you have experience.

Step 4 — Production upgrade

# 1. Take pre-upgrade manual snapshot
aws rds create-db-snapshot \
    --db-instance-identifier complitru-production \
    --db-snapshot-identifier pre-upgrade-2.0.0-$(date +%Y%m%d-%H%M)

# Wait for snapshot to complete
aws rds wait db-snapshot-available \
    --db-snapshot-identifier pre-upgrade-2.0.0-$(date +%Y%m%d-%H%M)

# 2. Enable maintenance mode (returns 503 to users with maintenance message)
aws ecs update-service \
    --cluster complitru-production \
    --service complitru-backend \
    --task-definition complitru-backend-maintenance:LATEST

# 3. Wait for tasks to drain
sleep 60

# 4. Run migrations
cd scripts
./migrate.sh --version=2.0.0

# 5. Update Terraform with new version + any new required variables
cd ../terraform
# (edit terraform.tfvars per migration guide)

# 6. Apply
terraform apply

# 7. Disable maintenance mode (Terraform apply restores normal task definitions)

# 8. Verify
cd ../scripts
./verify-deployment.sh

Major rollback

Major upgrades may include schema changes that are not safely reversible. The rollback procedure for a major upgrade is restore from snapshot:

# 1. Stop application traffic
aws ecs update-service \
    --cluster complitru-production \
    --service complitru-backend \
    --desired-count 0

# 2. Rename current DB instance (preserves the upgraded data for forensics)
aws rds modify-db-instance \
    --db-instance-identifier complitru-production \
    --new-db-instance-identifier complitru-production-2x-failed \
    --apply-immediately

# 3. Restore from pre-upgrade snapshot to the production identifier
aws rds restore-db-instance-from-db-snapshot \
    --db-instance-identifier complitru-production \
    --db-snapshot-identifier pre-upgrade-2.0.0-${TIMESTAMP}

# 4. Wait for restored instance to be available
aws rds wait db-instance-available \
    --db-instance-identifier complitru-production

# 5. Update Terraform back to old version, apply
cd terraform
sed -i.bak 's/complitru_version = "2.0.0"/complitru_version = "1.x.y"/' terraform.tfvars
terraform apply

# 6. Verify
cd ../scripts
./verify-deployment.sh

Data loss: anything written between snapshot time and rollback time. Document the gap and communicate to users.

Container image management

Image immutability

CompliTru-published images use immutable tags. Once a version is published, the digest never changes:

ghcr.io/complitru/backend:1.0.0    →    sha256:abc123...    (always)

If a critical issue is found, a new patch version is released — never a re-tag of the same version.

Pinning to digests

For maximum reproducibility, pin to digests in terraform.tfvars:

backend_image  = "ghcr.io/complitru/backend@sha256:abc123..."
frontend_image = "ghcr.io/complitru/frontend@sha256:def456..."
worker_image   = "ghcr.io/complitru/worker@sha256:ghi789..."

Digests for each release are published in release notes and in your release email.

Image signature verification

Every CompliTru image is signed with cosign. Verify before deploy:

cosign verify ghcr.io/complitru/backend:1.0.0 \
    --certificate-identity-regexp '.*@complitru\.ai$' \
    --certificate-oidc-issuer 'https://token.actions.githubusercontent.com'

If verification fails, do not deploy. Contact security@complitru.ai.

SBOM verification

Each release publishes an SBOM (Software Bill of Materials) in SPDX JSON format:

# Pull the official SBOM
curl -O https://releases.complitru.ai/1.0.0/sbom-backend.spdx.json

# Generate SBOM from the image you have
syft ghcr.io/complitru/backend:1.0.0 -o spdx-json > my-sbom.json

# Compare
diff <(jq -S . sbom-backend.spdx.json) <(jq -S . my-sbom.json)

Differences indicate the image was tampered with or you have the wrong version.

Migration of customer data

License key migration

License keys carry forward across upgrades. No action required.

Database migration safety

CompliTru migrations follow these rules:

Major releases may include destructive changes (drops, type changes). These are always called out in the migration guide and require explicit operator approval.

Configuration migration

When a new release renames or removes a configuration variable:

Testing upgrades

We strongly recommend maintaining a staging environment for upgrade testing. Patterns:

Pattern 1 — Permanent staging

Run a full staging deployment continuously, sized down. Cost: roughly 25–30% of production. Provides best confidence — staging is always real and current.

Pattern 2 — On-demand staging

Spin up staging only when an upgrade is planned. Restore from production snapshot, run upgrade, validate, tear down. Cost: ~$50/upgrade. Slower but cheaper.

Both patterns supported by the included Terraform modules — use a separate Terraform workspace.

Upgrade history audit

CompliTru tracks upgrade history in the application:

SELECT
    version,
    upgraded_at,
    upgraded_by,
    migrations_applied,
    duration_seconds
FROM upgrade_history
ORDER BY upgraded_at DESC;

Use this for compliance evidence (SOC 2 CC8.1 — change management).

Support during upgrades

For paid support customers, CompliTru provides upgrade assistance:

Contact support@complitru.ai at least one week ahead of a major upgrade to schedule.

Section 09

SOC 2 Control Mapping

CompliTru Self-Hosted — SOC 2 Control Mapping

How the CompliTru self-hosted deployment satisfies SOC 2 Trust Services Criteria. This document is structured for use as auditor evidence: each control statement maps to specific technical implementations with file references and verification steps.

The mapping covers Trust Services Criteria for: - Security (Common Criteria CC1–CC9) — required for all SOC 2 reports - Availability (A1) — applicable - Confidentiality (C1) — applicable - Processing Integrity (PI1) — applicable

Privacy (P1–P8) is applicable when CompliTru is used to handle personal data; controls noted where relevant.


CC1 — Control Environment

CC1.1 — Demonstrates commitment to integrity and ethical values

Implementation: - CompliTru's source code is governed by a Code of Conduct (in product source repository) - All committers sign a Contributor License Agreement - Customer-facing security disclosures handled via security@complitru.ai with PGP key

Customer evidence: - Acceptable Use Policy enforced via Terms of Service displayed at first login - Audit log captures every action attributed to a named user — see audit_log table

CC1.2 — Board oversight of internal control

Customer responsibility — CompliTru provides audit log evidence to support board-level reporting.

CC1.3 — Establishes structures, reporting lines, authorities, responsibilities

Customer evidence within CompliTru: - RBAC with admin, analyst, viewer, auditor roles (Settings → Users & Roles) - Role assignment audit log entries (audit_log filtered by action = 'role.assign') - Optional approval workflows for high-impact remediation (Settings → Approval Workflows)

CC1.4 — Demonstrates commitment to competence

Implementation: - CompliTru provides quarterly product training webinars to customer admins - Built-in inline documentation for every check and remediation - Onboarding workflow walks new admins through configuration

CC1.5 — Holds individuals accountable

Customer evidence: - Audit log captures actor, source IP, and before/after state for every mutation - Reports queryable per user, per time window: Settings → Audit Log → Export - Daily / weekly admin activity report scheduled to a security distribution list


CC2 — Communication and Information

CC2.1 — Internally communicates information necessary to support functioning of internal control

Implementation in CompliTru: - Notifications module (Slack, Teams, email, PagerDuty) for security findings and remediation events - Configurable digest reports per role (admin daily, analyst weekly, executive monthly) - In-app notification center for finding assignments and escalations

CC2.2 — Communicates with external parties regarding matters affecting the functioning of internal control

Implementation: - Webhooks for outbound integration with customer-managed systems (SIEM, ticketing) - API for programmatic access to findings and remediation status - CompliTru security advisories published via license-key-tied email distribution

CC2.3 — Communicates with external parties

Customer responsibility for use of CompliTru-provided data in external communication.


CC3 — Risk Assessment

CC3.1 — Specifies suitable objectives

Implementation: - Compliance frameworks built-in: SOC 2, ISO 27001, NIST 800-53, NIST CSF, CIS Benchmarks, HIPAA, PCI DSS - Custom framework authoring for organization-specific objectives - Posture dashboard: live percentage compliance per framework

CC3.2 — Identifies risk

Implementation: - Continuous scanning across configured cloud accounts - 600+ checks identify misconfigurations, vulnerabilities, identity risks, drift - AI-assisted risk narration explains business impact (when AI enabled)

CC3.3 — Considers fraud risk

Implementation: - CIEM module identifies privilege escalation paths, dormant identities with elevated permissions, cross-account access risks - Audit log captures every admin action with actor and source IP - Anomaly detection on user behavior (failed login spikes, unusual scan patterns)

CC3.4 — Identifies and analyzes significant change

Implementation: - Drift detection: previously-resolved findings that reappear are flagged as drift events - CloudTrail integration surfaces configuration changes affecting compliance posture - Scheduled reports show week-over-week and month-over-month posture changes


CC4 — Monitoring Activities

CC4.1 — Selects, develops, performs ongoing and/or separate evaluations

Implementation: - Continuous scanning at customer-defined cadence (default daily) - Real-time event-driven scans triggered by CloudTrail events (optional) - Manual scans on demand via API or UI - Health dashboard: live status of scanners, queues, integrations

CC4.2 — Evaluates and communicates control deficiencies

Implementation: - Findings prioritized by severity (Critical / High / Medium / Low / Info) - Configurable SLAs per severity (e.g., Critical resolved within 24 hours) - Overdue findings escalated automatically to designated channels - SLA compliance dashboard for management reporting


CC5 — Control Activities

CC5.1 — Selects and develops control activities

Implementation: - 600+ pre-built checks mapped to SOC 2, ISO, NIST, CIS, HIPAA, PCI DSS controls - Custom check authoring SDK for organization-specific controls - Frameworks editor for mapping checks to internal controls

CC5.2 — Selects and develops general controls over technology

Implementation: - Built-in checks for: encryption at rest, encryption in transit, IAM policies, network segmentation, logging, monitoring, backup, change management, vulnerability management - Coverage matrix visible at Settings → Frameworks → SOC 2

CC5.3 — Deploys through policies and procedures

Implementation: - Remediation playbooks codify standard fix procedures - Approval workflows enforce required reviews before changes apply - Audit log creates immutable record of every applied change


CC6 — Logical and Physical Access Controls

CC6.1 — Implements logical access security measures

CompliTru self-hosted implementation: - All traffic over TLS 1.2+ (HSTS enforced) - Authentication via username/password (Argon2 hashed), OAuth (Google / Microsoft), or SAML 2.0 SSO - MFA via TOTP (required for admin role by default) - API key authentication for programmatic access (SHA-256 hashed at rest) - Session management: configurable timeout, concurrent session limits, IP allow-listing - See SECURITY.md

CC6.2 — New internal and external users authorized

Implementation: - Self-service registration disabled by default — admin must invite new users - Invitation emails are time-limited, single-use - Default role on invitation is viewer — privilege elevation requires explicit admin action - All user creation events logged to audit log with actor and timestamp

CC6.3 — Removes access when no longer required

Implementation: - Admin can deactivate users with one click — sessions invalidated immediately - API keys revocable with immediate effect - SAML/OAuth users auto-deactivated when removed from upstream identity provider (with optional sync interval) - Audit log of all deactivation events

CC6.4 — Restricts physical access

Customer responsibility for AWS data center physical security — covered under AWS's SOC 2 report (AWS is a sub-processor; their report should be referenced in your audit).

CC6.5 — Protects against unauthorized access

Implementation: - WAF (optional, AWS WAF managed rule sets) at ALB - Rate limiting per IP and per API key - Failed authentication lockout (configurable threshold, default 5 attempts → 15-minute lockout) - Suspicious activity alerts (impossible travel, unusual access patterns) - VPC isolation: application and DB tier in private subnets, no public IPs on application tasks

CC6.6 — Implements logical access controls over media

Customer responsibility for AWS S3 / EBS configuration. CompliTru-created buckets follow least-privilege defaults; see SECURITY.md.

CC6.7 — Restricts the transmission of information

Implementation: - All data in transit encrypted (TLS 1.2+) - Internal ALB → ECS task traffic in private subnet (encryption-in-transit configurable) - AWS service calls (Secrets Manager, RDS, S3, Bedrock) use TLS by default - See SECURITY.md

CC6.8 — Implements controls to prevent or detect unauthorized changes

Implementation: - All admin actions write to immutable audit log (Object Lock S3 bucket, 7-year retention by default) - Configuration drift detection identifies changes to scanned resources - Approval workflows require multi-stage approval for sensitive changes - IAM policies follow least-privilege per service component


CC7 — System Operations

CC7.1 — Detects vulnerabilities

Self-hosted CompliTru implementation: - Container images built from minimal base, scanned with Trivy at CI time, SBOM published - Customer-side ECR scanning recommended for ongoing CVE detection on mirrored images - Quarterly security patches with 72-hour SLA for critical CVEs - See SECURITY.md

Customer environment scanning (the product itself): - Continuous vulnerability scanning of customer cloud resources - Vulnerability findings prioritized by EPSS score, exploitability, blast radius

CC7.2 — Monitors system components

Implementation: - CloudWatch container insights enabled on ECS cluster - Pre-configured alarms for: ECS task health, ALB error rates, RDS performance, license expiration - All application logs ship to CloudWatch Logs - Audit log mirrored to immutable S3 bucket

CC7.3 — Evaluates security events

Implementation: - Application audit log entries can be queried via UI (Settings → Audit Log) - CloudWatch Logs Insights provides query language for incident investigation - Optional integration with SIEM (Splunk, Datadog, Sumo Logic, Security Hub) for correlation

CC7.4 — Responds to identified security events

Implementation: - Documented incident response runbook in DEPLOYMENT.md - Containment actions (scale to zero, WAF block, secret rotation) documented with example commands - CompliTru security team contact for product-side incidents: security@complitru.ai

CC7.5 — Recovers from identified security events

Implementation: - RDS automated backups with point-in-time recovery (default 30-day retention) - S3 versioning on report bucket with lifecycle to Glacier - ECS stateless — redeploy from known-good container image - Documented rotation procedures for SECRET_KEY, ENCRYPTION_KEY, JWT_SECRET, license key, DB credentials


CC8 — Change Management

CC8.1 — Authorizes, designs, develops, configures, documents, tests, approves, implements changes

CompliTru product change management (vendor responsibility): - All changes via pull request with required code review - CI runs SAST (Bandit), SCA (npm audit, Trivy), secret scanning (Gitleaks) - All commits signed - SBOM published per release - Release notes describe every change

Customer change management to deployment: - Terraform state captures every infrastructure change - ECS task definitions versioned (cannot be edited, only replaced) - Container image digests pinnable for reproducible deployments - Application configuration changes via Secrets Manager versioning


CC9 — Risk Mitigation

CC9.1 — Identifies, selects, and develops risk mitigation activities

Implementation: - Risk dashboard ranks open findings by severity × exploitability × asset criticality - Remediation suggestions provided for every finding - Impact analysis evaluates blast radius before remediation applied - Auto-remediation available for whitelisted check categories

CC9.2 — Assesses and manages risks associated with vendors and business partners

Implementation: - Subprocessor list maintained at complitru.ai/legal/subprocessors - AWS sub-processors used: KMS, Secrets Manager, S3, RDS, ECS, ALB, CloudWatch, CloudTrail, ACM, Bedrock (optional), Textract (optional), Comprehend Medical (optional) - No SaaS sub-processors when self-hosted with Bedrock and ai_provider = "bedrock" and ai_enabled = true - See SECURITY.md for CompliTru's own supply chain controls


A1 — Availability

A1.1 — Uses, monitors, and evaluates current processing capacity

Implementation: - ECS auto-scaling: target tracking on CPU > 70% (configurable) - CloudWatch alarms on memory utilization, queue depth, RDS connections - Capacity dashboards in CloudWatch container insights

A1.2 — Implements environmental protections

Customer responsibility / AWS: - Multi-AZ RDS deployment by default - Multi-AZ ECS service distribution - Optional multi-region disaster recovery (RDS cross-region snapshot copy)

A1.3 — Tests recovery plan procedures

Customer responsibility: - Documented backup verification procedure in DEPLOYMENT.md - Recommended quarterly DR drill: restore RDS from snapshot to verification instance, confirm application starts


C1 — Confidentiality

C1.1 — Identifies and maintains confidential information

Implementation: - Field-level encryption (Fernet) for sensitive data: API keys, integration credentials, webhook secrets - Encryption keys stored in Secrets Manager (KMS-encrypted) - Database-level encryption at rest via RDS KMS encryption - Application classifies findings by sensitivity (public / internal / confidential / restricted)

C1.2 — Disposes of confidential information

Implementation: - Configurable retention per finding type (default 365 days for resolved findings) - Audit log retention: configurable, default 7 years (Object Lock prevents earlier deletion) - User deletion: GDPR-style erasure with audit trail of what was removed - S3 lifecycle rules move old reports to Glacier then expire per policy


PI1 — Processing Integrity

PI1.1 — Obtains or generates relevant, quality information regarding processing objectives

Implementation: - Scan results include source resource ID, scan timestamp, check version - Findings include reproducible "evidence" — the exact API responses that triggered the finding - Provenance trail from finding → scan → check version → policy that was evaluated

PI1.2 — Implements policies and procedures over inputs

Implementation: - All inputs validated at API boundary (JSON schemas, type checking) - File uploads restricted (MIME allowlist, size limits, optional virus scanning hook) - ORM-only database access (no raw SQL on user input) - ReDoS protection on regex patterns from custom checks

PI1.3 — Implements policies and procedures over processing

Implementation: - Idempotent scan operations (re-running a scan produces consistent findings) - Atomic remediation operations with rollback capture - Distributed task execution with idempotency keys (Celery + Redis) - Database transactions wrap multi-step operations

PI1.4 — Implements policies and procedures over outputs

Implementation: - Findings tagged with confidence score and check version - Reports include scan metadata: timestamp, scope, exclusions, scanner version - Output encoding on all rendered HTML to prevent injection - Signed JWT for API responses where integrity is critical

PI1.5 — Implements policies and procedures over storage

Implementation: - All data at rest encrypted (RDS KMS, S3 SSE-S3 / SSE-KMS, Secrets Manager KMS) - Field-level encryption for highest-sensitivity fields - Database backups encrypted, retention enforced - Audit log written to Object Lock bucket (immutable per retention period)


Sub-processor disclosure

When deployed self-hosted:

Sub-processor Purpose Customer relationship
AWS (your account) Compute, storage, network, secrets, KMS Direct — your AWS contract
AWS Bedrock (your account) AI inference (if ai_provider=bedrock) Direct — your AWS contract
OpenAI AI inference (if ai_provider=openai) Direct — your OpenAI contract
AWS Comprehend Medical (your account) PHI detection (if de-identification used) Direct — your AWS contract
AWS Textract (your account) OCR for image-based documents (if used) Direct — your AWS contract
license.complitru.ai License key validation (outbound only, no customer data) CompliTru

When ai_enabled = false and image OCR / de-identification not used, CompliTru is the only sub-processor with any visibility — and visibility is limited to license validation pings (license key ID + timestamp, no customer data).


Auditor evidence collection

For auditors performing a SOC 2 examination of the customer's CompliTru deployment:

Evidence type Where to find it
Audit log of admin actions RDS table audit_log; export via Settings → Audit Log → Export CSV
Audit log of authentication events RDS table audit_log filtered by category = 'auth'; CloudWatch Logs /aws/ecs/complitru-backend filtered by event = 'auth.*'
Configuration baseline Terraform state file (S3-backed); terraform.tfvars versioned in customer repo
Encryption verification aws rds describe-db-instances, aws s3api get-bucket-encryption, aws kms describe-key
Backup verification aws rds describe-db-snapshots, S3 versioning enabled state
Vulnerability scan results ECR image scan results, customer-side Trivy or Inspector results on container images
Compliance posture history CompliTru Reports → Compliance Posture History; export PDF or CSV
Change management Terraform state history (S3 versioning); ECS task definition revisions; container image digest history in ECR

CompliTru can produce supplementary letters of attestation for: - Image build provenance (signed attestations from CompliTru's CI) - SBOM accuracy (Syft-generated, signed) - Security patch SLA performance (historical mean time to patch)

Contact support@complitru.ai to request these for a specific reporting period.