Operations & troubleshooting

The shape of a Paas deployment is docker compose ps — most of your operational primitives are familiar.

Where to look first

Symptom Service to tail
git push hangs / refuses git-server
Push accepted but no Deployment row api (look for /internal/git/post-receive)
Deployment stuck Queued builder
Deployment built but app not live orchestrator
App live but URL 404/502 proxy-controller then nginx
TLS not active cert-manager
Database stuck Provisioning orchestrator (the DatabaseProvisioner runs there)
docker compose logs -f api orchestrator builder proxy-controller cert-manager

Health endpoint

GET /health on the API returns {"ok":true} if the API is up. The orchestrator/builder/proxy/cert services don't expose HTTP themselves — use docker compose ps to see if they're Up.

Common failure modes

"I pushed and nothing happened"

  1. Did the push succeed without an error from the post-receive hook? If you saw [paas] queueing deploy: …, the hook fired. Check: docker compose logs api | grep post-receive.
  2. If the API responded non-200, the body is in the API logs.
  3. If you saw no [paas] output, the bare repo's hook isn't installed — that happens when sshd's AuthorizedKeysCommand is failing (check git-server logs) and a key fell back to a default shell. Re-add your SSH key in the dashboard.

"Deployment is stuck in Building"

Look at builder logs. Common: out-of-disk on the host (docker system df), or your Dockerfile requires network access to a registry that's blocked.

"App is Live but I get 502s"

  • Inspect: docker compose exec nginx cat /etc/nginx/conf.d/<host>.conf. The upstream block should list at least one IP.
  • If it lists IPs but they're unreachable, check paas-apps network: docker network inspect paas-apps. Both nginx and the user container must be members.
  • If the upstream block has only server 127.0.0.1:1 down;, the orchestrator has no Healthy containers for the current Release. Look at container_instances: docker compose exec postgres psql -U paas paas -c 'select status, failure_reason from container_instances order by created_at desc limit 10;'.

"The cert is stuck Pending forever"

  • docker compose logs cert-manager — the underlying ACME error is surfaced.
  • Verify port 80 is reachable from the public internet (curl -v http://<host>/.well-known/acme-challenge/test).
  • Confirm PAAS_ACME_STAGING=true while debugging — staging has higher rate limits.

Backing up

The state-of-record is the control-plane Postgres in volume pg-data, plus the repos (bare git repos), paas-secrets (master key + ACME account), and certs (issued certificates) volumes. A serviceable backup plan:

docker compose exec postgres pg_dump -U paas paas | gzip > paas-$(date +%F).sql.gz
docker run --rm -v paas-secrets:/src -v $PWD:/dst alpine tar czf /dst/secrets-$(date +%F).tgz -C /src .
docker run --rm -v repos:/src -v $PWD:/dst alpine tar czf /dst/repos-$(date +%F).tgz -C /src .

The local Docker registry (registry-data) is not backed up — images are reproducible from repos + Dockerfile.

Restoring on a new host

# (Install Paas as usual, then immediately, before users sign in:)
docker compose -f docker-compose.yml down
docker volume create paas-secrets pg-data repos
docker run --rm -v paas-secrets:/dst -v $PWD:/src alpine tar xzf /src/secrets-….tgz -C /dst
docker run --rm -v repos:/dst -v $PWD:/src alpine tar xzf /src/repos-….tgz -C /dst
gunzip < paas-….sql.gz | docker compose run --rm -T postgres psql -U paas paas
docker compose up -d

Maintenance mode

There's no first-class maintenance mode in v1. To take an app offline temporarily, set replicas=0 (the upstream block will go to the "down" placeholder and nginx will return 502). To take the whole platform offline:

docker compose stop nginx

…leaves the dashboard / API unreachable but preserves all state.

Resetting everything

docker compose down -v   # WARNING: deletes ALL state including user apps