Update prod_deployment.md
This commit is contained in:
@@ -90,15 +90,26 @@ kubectl --context prod -n ai exec minio-0 -- mc alias set local http://localhost
|
||||
kubectl --context prod -n ai exec minio-0 -- mc rm --recursive --force local/warehouse/
|
||||
```
|
||||
|
||||
#### 4. Run the full deploy
|
||||
#### 4. Delete sandbox deployments and wipe sandbox PVCs
|
||||
|
||||
Sandbox PVCs have a finalizer that prevents deletion until the sandbox pod is gone. Delete the deployments first, then the PVCs:
|
||||
|
||||
```bash
|
||||
kubectl --context prod -n sandbox delete deployments --all
|
||||
kubectl --context prod -n sandbox delete pvc --all
|
||||
```
|
||||
|
||||
The PVC deletion will complete once the pods finish terminating (Ceph cleanup can take ~30s). You can proceed to the deploy immediately — it does not depend on PVC termination completing.
|
||||
|
||||
#### 5. Run the full deploy
|
||||
|
||||
```bash
|
||||
bin/deploy-all --sandboxes
|
||||
```
|
||||
|
||||
This rebuilds and redeploys all services, including `iceberg-catalog`, `flink-jobmanager`, and `flink-taskmanager` (which were scaled to zero above — `deploy-all` will restore them to their manifest replica counts).
|
||||
This rebuilds and redeploys all services, including `iceberg-catalog`, `flink-jobmanager`, and `flink-taskmanager` (which were scaled to zero above — `deploy-all` will restore them to their manifest replica counts). The `--sandboxes` flag also cleans up any remaining sandbox Services.
|
||||
|
||||
#### 5. Re-apply the gateway database schema
|
||||
#### 6. Re-apply the gateway database schema
|
||||
|
||||
The gateway does **not** auto-migrate. After the `iceberg` database is recreated, the schema must be applied manually:
|
||||
|
||||
@@ -108,7 +119,7 @@ kubectl --context prod -n ai exec -i postgres-0 -- psql -U postgres -d iceberg <
|
||||
|
||||
This creates the `user`, `session`, `user_licenses`, and related tables.
|
||||
|
||||
#### 6. Recreate all users
|
||||
#### 7. Recreate all users
|
||||
|
||||
```bash
|
||||
bin/create-all-users prod
|
||||
@@ -142,7 +153,7 @@ kubectl --context prod -n ai logs deployment/gateway --tail=100
|
||||
|
||||
**Cause:** Dropping the `iceberg` database removes the gateway's auth tables along with the Iceberg catalog metadata — they share the same database.
|
||||
|
||||
**Fix:** Re-apply the schema and recreate users (steps 5 and 6 above).
|
||||
**Fix:** Re-apply the schema and recreate users (steps 6 and 7 above).
|
||||
|
||||
### Gateway shows `42P01` errors but pod is running
|
||||
|
||||
@@ -164,3 +175,25 @@ The gateway does not auto-migrate on startup. The schema file must be applied ma
|
||||
1Password's `op inject` requires interactive desktop authentication. Running it via `echo "yes" | bin/secret-update prod` or any background/piped invocation will fail silently (the script prints `✓` even though `kubectl apply` received empty input).
|
||||
|
||||
**Fix:** Run `bin/secret-update prod` in an interactive terminal with 1Password unlocked.
|
||||
|
||||
### Config validation warnings during `bin/deploy-all`
|
||||
|
||||
**Symptom:** Step 3 (config update) prints errors like:
|
||||
```
|
||||
error: error validating "deploy/k8s/prod/configs/relay-config.yaml": error validating data: [apiVersion not set, kind not set]
|
||||
```
|
||||
for `relay-config`, `ingestor-config`, and `flink-config`.
|
||||
|
||||
**Cause:** These config files are raw data files (not Kubernetes manifests), so `kubectl` can't validate their structure. The underlying `kubectl create configmap` command succeeds regardless.
|
||||
|
||||
**Impact:** None — the configs are applied correctly and the script reports `✓ All configs updated successfully`. These warnings are expected and can be ignored.
|
||||
|
||||
### Flink image build produces many Maven shading warnings
|
||||
|
||||
**Symptom:** During Step 4, the Flink image build outputs dozens of `[WARNING] Discovered module-info.class` and overlapping class/resource warnings from Maven.
|
||||
|
||||
**Impact:** None — these are pre-existing warnings from bundling Iceberg, AWS SDK, and Flink dependencies together into a shaded JAR. The build completes successfully.
|
||||
|
||||
### `bin/deploy-all` confirmation prompt
|
||||
|
||||
Unlike `bin/secret-update`, the `bin/deploy-all` confirmation prompt (`Are you sure you want to continue? (yes/no)`) works fine with `echo "yes" | bin/deploy-all --sandboxes` from a script or non-interactive context.
|
||||
|
||||
Reference in New Issue
Block a user