Runbook & reference
Everything you need to operate, extend, and tear down this deployment. Share-safe — contains no API key material. See splash for the public face.
Architecture
End-to-end request flow from an OpenAI-compatible client (curl, the OpenAI SDKs, the pi coding agent, etc.) through this Cloud Run proxy to a Gemini model on Vertex AI. Solid arrows are the request path; dashed arrows are the streamed SSE response.
%%{init: {
'theme': 'base',
'themeVariables': {
'background': 'transparent',
'primaryColor': '#152036',
'primaryTextColor': '#e8eef4',
'primaryBorderColor': '#7fb3ff',
'secondaryColor': '#0f1626',
'tertiaryColor': '#0a1120',
'lineColor': '#7fb3ff',
'clusterBkg': 'rgba(127,179,255,0.05)',
'clusterBorder': 'rgba(127,179,255,0.35)',
'fontFamily': 'Inter, system-ui, sans-serif',
'fontSize': '14px'
}
}}%%
flowchart LR
C["OpenAI-compatible client
(curl · OpenAI SDK · pi agent)"]
subgraph GCP ["Google Cloud Project · rajdphd-prep"]
direction TB
R["Cloud Run
ai-proxy
FastAPI"]
V["Vertex AI
OpenAI-compat endpoint"]
M(["Gemini 2.5 Flash
Gemini 3.1 Pro Preview"])
end
C ==>|"POST /v1/chat/completions
x-api-key: sk_live_…"| R
R ==>|"validate key · rewrite alias
attach SA bearer"| V
V ==> M
M -. "SSE chunks" .-> V
V -. "SSE chunks" .-> R
R -. "SSE chunks" .-> C
1. What this is
A single Cloud Run service (ai-proxy) that fronts
Vertex AI's OpenAI-compatible Chat Completions endpoint. Its only jobs are:
- Terminate TLS on your own hostname (
dev.aiaggies.net). - Validate a caller-supplied
x-api-keyagainst an issued-key list. - Rewrite short model aliases (
flash) to the full Google model IDs. - Attach a Google service-account bearer and forward to Vertex.
- Pass the upstream JSON back unchanged.
There is no Apigee, no database, no queue, no Load Balancer. The service scales to zero when idle; at single-developer usage the non-model bill is effectively $0.
2. URLs & base paths
- Custom domain
- https://dev.aiaggies.net
- Cloud Run URL
- http://dev.aiaggies.net
- OpenAI SDK base_url
- https://dev.aiaggies.net/v1
- Splash
- /
- Docs (this page)
- /docs
- Health
- /healthz
- Models
- /v1/models
- Chat
- /v1/chat/completions
While the managed TLS cert for dev.aiaggies.net is still issuing, call the
Cloud Run URL directly. The two are byte-identical in behavior.
3. Authentication
Every request to /v1/* must include:
x-api-key: sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Missing, unknown, or disabled keys return 401. Keys are loaded from the
API_KEYS_JSON environment variable at startup. The OpenAI SDK's default
Authorization: Bearer header is ignored by the proxy; pass the key
via default_headers={"x-api-key": ...}.
4. Endpoints
Public splash page. No auth.
This page. No auth.
Liveness. Returns {"ok": true, "models": [...]}.
OpenAI-compatible model list. Requires x-api-key.
OpenAI-compatible chat completion. Requires x-api-key.
5. Model aliases
Clients use short aliases; the proxy rewrites them server-side before calling Vertex. This means you can change backing models without touching client code.
flash → google/gemini-2.5-flashpro-preview → google/gemini-3.1-pro-preview
Controlled by MODEL_FLASH_ID and MODEL_PRO_PREVIEW_ID env vars.
6. Quick start
.env
# replace with your issued key
AI_API_BASE=https://dev.aiaggies.net/v1
API_KEY=sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
curl
source .env # list models curl -sS "$AI_API_BASE/models" -H "x-api-key: $API_KEY" | jq # chat completion curl -sS -X POST "$AI_API_BASE/chat/completions" \ -H "content-type: application/json" \ -H "x-api-key: $API_KEY" \ -d '{"model":"flash","messages":[{"role":"user","content":"Say hi."}]}' | jq
Python (OpenAI SDK)
import os
from openai import OpenAI
client = OpenAI(
base_url=os.environ["AI_API_BASE"],
api_key=os.environ["API_KEY"],
default_headers={"x-api-key": os.environ["API_KEY"]},
)
# list
for m in client.models.list().data:
print(m.id, "->", getattr(m, "vertex_id", ""))
# chat
resp = client.chat.completions.create(
model="flash",
messages=[{"role": "user", "content": "Say hi."}],
)
print(resp.choices[0].message.content)
JavaScript / Node (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: process.env.AI_API_BASE,
apiKey: process.env.API_KEY,
defaultHeaders: { "x-api-key": process.env.API_KEY },
});
const resp = await client.chat.completions.create({
model: "flash",
messages: [{ role: "user", content: "Say hi." }],
});
console.log(resp.choices[0].message.content);
7. Using pi as a sandboxed client
pi is a minimal terminal coding agent that speaks the OpenAI Chat Completions wire format, so it plugs directly into this proxy. The pi author recommends running it inside a container — there are no permission prompts by design. The setup below keeps pi isolated from everything on the host except the project directory you launch it from.
What was set up
- A throwaway Docker image (
pi-sandbox:latest) built from~/Development/pi-sandbox/Dockerfile—node:22-bookworm-slim+@mariozechner/pi-coding-agent+ git, ripgrep, jq, python3, curl. Runs as a non-root user. - A host-side wrapper at
~/Development/pi-sandbox/pithat runsdocker runwith exactly two bind mounts: the current directory as/workspaceand~/.pi-sandboxas the container's~/.pi(so sessions, auth, and installed pi packages persist). Nothing else from$HOMEis visible to pi. - A custom provider declared in
~/.pi-sandbox/agent/models.jsonpointing at this Cloud Run service. The wrapper forwardsAI_API_BASEandAPI_KEYfrom the shell, and the provider config tells pi to sendx-api-key(which is what this service authenticates against). - A streaming-SSE passthrough added to
/v1/chat/completionson this proxy. pi defaults tostream: true; without passthrough the proxy would buffer the SSE response and return an empty completion. See section 9 (Operations) for the revision that shipped this.
~/.pi-sandbox/agent/models.json
This file makes pi aware of the aiaggies provider and its two
models. "API_KEY" is an env var name — pi resolves it
against the container env at request time, so no key is stored on disk.
{
"providers": {
"aiaggies": {
"baseUrl": "http://dev.aiaggies.net/v1",
"api": "openai-completions",
"apiKey": "API_KEY",
"headers": { "x-api-key": "API_KEY" },
"compat": {
"supportsDeveloperRole": false,
"supportsReasoningEffort": false,
"maxTokensField": "max_tokens"
},
"models": [
{
"id": "flash",
"name": "Gemini 2.5 Flash (via aiaggies)",
"reasoning": false,
"input": ["text", "image"],
"contextWindow": 1048576,
"maxTokens": 65536,
"cost": { "input": 0.075, "output": 0.30, "cacheRead": 0.01875, "cacheWrite": 0 }
},
{
"id": "pro-preview",
"name": "Gemini 3.1 Pro Preview (via aiaggies)",
"reasoning": true,
"input": ["text", "image"],
"contextWindow": 1048576,
"maxTokens": 65536,
"cost": { "input": 1.25, "output": 10.0, "cacheRead": 0.3125, "cacheWrite": 0 }
}
]
}
}
}
Launching pi
# 1. load AI_API_BASE + API_KEY into the shell so the wrapper forwards them cd ~/Development/vertex-ai-dev set -a && . ./.env && set +a # 2. interactive TUI, isolated to the current directory ~/Development/pi-sandbox/pi --provider aiaggies --model flash # non-interactive one-shot ~/Development/pi-sandbox/pi -p "summarize the repo" --provider aiaggies --model flash # list configured models (confirms aiaggies/flash + pro-preview are registered) ~/Development/pi-sandbox/pi --list-models # drop into a shell inside the sandbox ~/Development/pi-sandbox/pi shell # rebuild the image when upgrading pi itself ~/Development/pi-sandbox/pi rebuild
Put the wrapper on your PATH for just pi:
ln -s ~/Development/pi-sandbox/pi /usr/local/bin/pi.
Sandbox reach
- Current dir (
$(pwd)) - →
/workspace· read/write ~/.pi-sandbox- →
~/.piin the container · persistent state ~/.ssh,~/.aws, rest of$HOME- not mounted · unreachable
- Docker socket
- not mounted · no container escape
- Network
- enabled · required for LLM API calls
If pi returns an empty assistant message, the proxy is probably on a revision without SSE passthrough — check section 9 and redeploy.
8. Managing API keys
Keys live in the API_KEYS_JSON Cloud Run env var as a JSON array. Each entry
has an id (for logs), the key, and an enabled flag.
No database required — revoke by flipping enabled to false
(or removing the entry) and updating the service.
Format
[
{"id": "raj-laptop", "key": "sk_live_xxxx...", "enabled": true},
{"id": "raj-ci", "key": "sk_live_yyyy...", "enabled": true}
]
Generate a new key
python3 -c 'import secrets; print("sk_live_" + secrets.token_urlsafe(32))'
Update keys without rebuilding the image
gcloud run services update ai-proxy \ --region=us-central1 --project=rajdphd-prep \ --set-env-vars="^##^API_KEYS_JSON=$(cat api-keys.json | jq -c .)"
Env vars are visible to anyone with roles/run.viewer on the project. For real
separation of duties, move API_KEYS_JSON into Secret Manager and read it at
startup with roles/secretmanager.secretAccessor.
9. GCP resources
- Project
- rajdphd-prep
- Region
- us-central1
- Cloud Run service
- ai-proxy
- Service account
- ai-proxy-sa@rajdphd-prep.iam.gserviceaccount.com
- IAM role
- roles/aiplatform.user
- Domain mapping
- dev.aiaggies.net → ai-proxy
- DNS record
- dev CNAME ghs.googlehosted.com.
APIs enabled: run.googleapis.com, aiplatform.googleapis.com,
artifactregistry.googleapis.com, cloudbuild.googleapis.com.
10. Operations
Redeploy after a code change
cd ~/Development/vertex-ai-dev gcloud run deploy ai-proxy \ --source=./proxy \ --region=us-central1 --project=rajdphd-prep \ --service-account=ai-proxy-sa@rajdphd-prep.iam.gserviceaccount.com \ --allow-unauthenticated \ --min-instances=0 --max-instances=3 \ --quiet
Read structured logs
gcloud logging read \ 'resource.type="cloud_run_revision" AND resource.labels.service_name="ai-proxy"' \ --project=rajdphd-prep --limit=50 --format=json --freshness=30m
Current revision & traffic
gcloud run services describe ai-proxy \ --region=us-central1 --project=rajdphd-prep \ --format='value(status.latestReadyRevisionName,status.url)'
Roll back one revision
gcloud run services update-traffic ai-proxy \ --region=us-central1 --project=rajdphd-prep \ --to-revisions=<PREVIOUS_REVISION_NAME>=100
Check custom-domain cert status
gcloud beta run domain-mappings describe \ --domain=dev.aiaggies.net \ --region=us-central1 --project=rajdphd-prep \ --format='value(status.conditions[].type,status.conditions[].status,status.conditions[].message)'
When CertificateProvisioned=True and Ready=True, HTTPS on the
custom domain is live.
11. Cost
- Cloud Run: scale-to-zero; free tier covers ~2M requests / 360k vCPU-s / 180k GiB-s per month.
- Artifact Registry: a few MB; pennies per month.
- Cloud Build: only runs on deploy; free tier covers casual use.
- Domain mapping: $0. No Load Balancer.
- Cloud Logging: 50 GiB/mo ingest free.
- Vertex AI tokens: pay per token at Google's published rates — the only real cost.
At single-user scale, expect $0 per month for infra. Cost scales linearly with actual use only.
12. Files in the repo
- proxy/main.py
- FastAPI app: routes, auth, Vertex forwarding.
- proxy/pages.py
- Splash + this runbook HTML.
- proxy/Dockerfile
- Container used by
gcloud run deploy --source. - proxy/requirements.txt
- Python deps (FastAPI, httpx, google-auth, requests).
- deploy.sh
- One-shot idempotent deploy script.
- SPEC.md
- Design contract this implementation satisfies.
- html/index.html
- Local documentation site (runs via
python3 serve.py). - .env
- Local-only; holds
AI_API_BASEandAPI_KEY.
13. Teardown
Removes all GCP resources this project created. Vertex AI itself stays enabled.
gcloud beta run domain-mappings delete --domain=dev.aiaggies.net \ --region=us-central1 --project=rajdphd-prep --quiet gcloud run services delete ai-proxy \ --region=us-central1 --project=rajdphd-prep --quiet gcloud projects remove-iam-policy-binding rajdphd-prep \ --member="serviceAccount:ai-proxy-sa@rajdphd-prep.iam.gserviceaccount.com" \ --role="roles/aiplatform.user" --condition=None --quiet gcloud iam service-accounts delete ai-proxy-sa@rajdphd-prep.iam.gserviceaccount.com \ --project=rajdphd-prep --quiet