dev.aiaggies.net · runbook

Runbook & reference

Everything you need to operate, extend, and tear down this deployment. Share-safe — contains no API key material. See splash for the public face.

Architecture

End-to-end request flow from an OpenAI-compatible client (curl, the OpenAI SDKs, the pi coding agent, etc.) through this Cloud Run proxy to a Gemini model on Vertex AI. Solid arrows are the request path; dashed arrows are the streamed SSE response.

%%{init: {
  'theme': 'base',
  'themeVariables': {
    'background': 'transparent',
    'primaryColor': '#152036',
    'primaryTextColor': '#e8eef4',
    'primaryBorderColor': '#7fb3ff',
    'secondaryColor': '#0f1626',
    'tertiaryColor': '#0a1120',
    'lineColor': '#7fb3ff',
    'clusterBkg': 'rgba(127,179,255,0.05)',
    'clusterBorder': 'rgba(127,179,255,0.35)',
    'fontFamily': 'Inter, system-ui, sans-serif',
    'fontSize': '14px'
  }
}}%%
flowchart LR
  C["OpenAI-compatible client
(curl · OpenAI SDK · pi agent)"] subgraph GCP ["Google Cloud Project · rajdphd-prep"] direction TB R["Cloud Run
ai-proxy
FastAPI"] V["Vertex AI
OpenAI-compat endpoint"] M(["Gemini 2.5 Flash
Gemini 3.1 Pro Preview"]) end C ==>|"POST /v1/chat/completions
x-api-key: sk_live_…"| R R ==>|"validate key · rewrite alias
attach SA bearer"| V V ==> M M -. "SSE chunks" .-> V V -. "SSE chunks" .-> R R -. "SSE chunks" .-> C

1. What this is

A single Cloud Run service (ai-proxy) that fronts Vertex AI's OpenAI-compatible Chat Completions endpoint. Its only jobs are:

There is no Apigee, no database, no queue, no Load Balancer. The service scales to zero when idle; at single-developer usage the non-model bill is effectively $0.

2. URLs & base paths

Custom domain
https://dev.aiaggies.net
Cloud Run URL
http://dev.aiaggies.net
OpenAI SDK base_url
https://dev.aiaggies.net/v1
Splash
/
Docs (this page)
/docs
Health
/healthz
Models
/v1/models
Chat
/v1/chat/completions

While the managed TLS cert for dev.aiaggies.net is still issuing, call the Cloud Run URL directly. The two are byte-identical in behavior.

3. Authentication

Every request to /v1/* must include:

x-api-key: sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Missing, unknown, or disabled keys return 401. Keys are loaded from the API_KEYS_JSON environment variable at startup. The OpenAI SDK's default Authorization: Bearer header is ignored by the proxy; pass the key via default_headers={"x-api-key": ...}.

4. Endpoints

GET/

Public splash page. No auth.

GET/docs

This page. No auth.

GET/healthz

Liveness. Returns {"ok": true, "models": [...]}.

GET/v1/models

OpenAI-compatible model list. Requires x-api-key.

POST/v1/chat/completions

OpenAI-compatible chat completion. Requires x-api-key.

5. Model aliases

Clients use short aliases; the proxy rewrites them server-side before calling Vertex. This means you can change backing models without touching client code.

flashgoogle/gemini-2.5-flash
pro-previewgoogle/gemini-3.1-pro-preview

Controlled by MODEL_FLASH_ID and MODEL_PRO_PREVIEW_ID env vars.

6. Quick start

.env

# replace with your issued key
AI_API_BASE=https://dev.aiaggies.net/v1
API_KEY=sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

curl

source .env

# list models
curl -sS "$AI_API_BASE/models" -H "x-api-key: $API_KEY" | jq

# chat completion
curl -sS -X POST "$AI_API_BASE/chat/completions" \
  -H "content-type: application/json" \
  -H "x-api-key: $API_KEY" \
  -d '{"model":"flash","messages":[{"role":"user","content":"Say hi."}]}' | jq

Python (OpenAI SDK)

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ["AI_API_BASE"],
    api_key=os.environ["API_KEY"],
    default_headers={"x-api-key": os.environ["API_KEY"]},
)

# list
for m in client.models.list().data:
    print(m.id, "->", getattr(m, "vertex_id", ""))

# chat
resp = client.chat.completions.create(
    model="flash",
    messages=[{"role": "user", "content": "Say hi."}],
)
print(resp.choices[0].message.content)

JavaScript / Node (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: process.env.AI_API_BASE,
  apiKey: process.env.API_KEY,
  defaultHeaders: { "x-api-key": process.env.API_KEY },
});

const resp = await client.chat.completions.create({
  model: "flash",
  messages: [{ role: "user", content: "Say hi." }],
});
console.log(resp.choices[0].message.content);

7. Using pi as a sandboxed client

pi is a minimal terminal coding agent that speaks the OpenAI Chat Completions wire format, so it plugs directly into this proxy. The pi author recommends running it inside a container — there are no permission prompts by design. The setup below keeps pi isolated from everything on the host except the project directory you launch it from.

What was set up

~/.pi-sandbox/agent/models.json

This file makes pi aware of the aiaggies provider and its two models. "API_KEY" is an env var name — pi resolves it against the container env at request time, so no key is stored on disk.

{
  "providers": {
    "aiaggies": {
      "baseUrl": "http://dev.aiaggies.net/v1",
      "api": "openai-completions",
      "apiKey": "API_KEY",
      "headers": { "x-api-key": "API_KEY" },
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false,
        "maxTokensField": "max_tokens"
      },
      "models": [
        {
          "id": "flash",
          "name": "Gemini 2.5 Flash (via aiaggies)",
          "reasoning": false,
          "input": ["text", "image"],
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "cost": { "input": 0.075, "output": 0.30, "cacheRead": 0.01875, "cacheWrite": 0 }
        },
        {
          "id": "pro-preview",
          "name": "Gemini 3.1 Pro Preview (via aiaggies)",
          "reasoning": true,
          "input": ["text", "image"],
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "cost": { "input": 1.25, "output": 10.0, "cacheRead": 0.3125, "cacheWrite": 0 }
        }
      ]
    }
  }
}

Launching pi

# 1. load AI_API_BASE + API_KEY into the shell so the wrapper forwards them
cd ~/Development/vertex-ai-dev
set -a && . ./.env && set +a

# 2. interactive TUI, isolated to the current directory
~/Development/pi-sandbox/pi --provider aiaggies --model flash

# non-interactive one-shot
~/Development/pi-sandbox/pi -p "summarize the repo" --provider aiaggies --model flash

# list configured models (confirms aiaggies/flash + pro-preview are registered)
~/Development/pi-sandbox/pi --list-models

# drop into a shell inside the sandbox
~/Development/pi-sandbox/pi shell

# rebuild the image when upgrading pi itself
~/Development/pi-sandbox/pi rebuild

Put the wrapper on your PATH for just pi: ln -s ~/Development/pi-sandbox/pi /usr/local/bin/pi.

Sandbox reach

Current dir ($(pwd))
/workspace · read/write
~/.pi-sandbox
~/.pi in the container · persistent state
~/.ssh, ~/.aws, rest of $HOME
not mounted · unreachable
Docker socket
not mounted · no container escape
Network
enabled · required for LLM API calls

If pi returns an empty assistant message, the proxy is probably on a revision without SSE passthrough — check section 9 and redeploy.

8. Managing API keys

Keys live in the API_KEYS_JSON Cloud Run env var as a JSON array. Each entry has an id (for logs), the key, and an enabled flag. No database required — revoke by flipping enabled to false (or removing the entry) and updating the service.

Format

[
  {"id": "raj-laptop", "key": "sk_live_xxxx...", "enabled": true},
  {"id": "raj-ci",     "key": "sk_live_yyyy...", "enabled": true}
]

Generate a new key

python3 -c 'import secrets; print("sk_live_" + secrets.token_urlsafe(32))'

Update keys without rebuilding the image

gcloud run services update ai-proxy \
  --region=us-central1 --project=rajdphd-prep \
  --set-env-vars="^##^API_KEYS_JSON=$(cat api-keys.json | jq -c .)"

Env vars are visible to anyone with roles/run.viewer on the project. For real separation of duties, move API_KEYS_JSON into Secret Manager and read it at startup with roles/secretmanager.secretAccessor.

9. GCP resources

Project
rajdphd-prep
Region
us-central1
Cloud Run service
ai-proxy
Service account
ai-proxy-sa@rajdphd-prep.iam.gserviceaccount.com
IAM role
roles/aiplatform.user
Domain mapping
dev.aiaggies.net → ai-proxy
DNS record
dev CNAME ghs.googlehosted.com.

APIs enabled: run.googleapis.com, aiplatform.googleapis.com, artifactregistry.googleapis.com, cloudbuild.googleapis.com.

10. Operations

Redeploy after a code change

cd ~/Development/vertex-ai-dev

gcloud run deploy ai-proxy \
  --source=./proxy \
  --region=us-central1 --project=rajdphd-prep \
  --service-account=ai-proxy-sa@rajdphd-prep.iam.gserviceaccount.com \
  --allow-unauthenticated \
  --min-instances=0 --max-instances=3 \
  --quiet

Read structured logs

gcloud logging read \
  'resource.type="cloud_run_revision" AND resource.labels.service_name="ai-proxy"' \
  --project=rajdphd-prep --limit=50 --format=json --freshness=30m

Current revision & traffic

gcloud run services describe ai-proxy \
  --region=us-central1 --project=rajdphd-prep \
  --format='value(status.latestReadyRevisionName,status.url)'

Roll back one revision

gcloud run services update-traffic ai-proxy \
  --region=us-central1 --project=rajdphd-prep \
  --to-revisions=<PREVIOUS_REVISION_NAME>=100

Check custom-domain cert status

gcloud beta run domain-mappings describe \
  --domain=dev.aiaggies.net \
  --region=us-central1 --project=rajdphd-prep \
  --format='value(status.conditions[].type,status.conditions[].status,status.conditions[].message)'

When CertificateProvisioned=True and Ready=True, HTTPS on the custom domain is live.

11. Cost

At single-user scale, expect $0 per month for infra. Cost scales linearly with actual use only.

12. Files in the repo

proxy/main.py
FastAPI app: routes, auth, Vertex forwarding.
proxy/pages.py
Splash + this runbook HTML.
proxy/Dockerfile
Container used by gcloud run deploy --source.
proxy/requirements.txt
Python deps (FastAPI, httpx, google-auth, requests).
deploy.sh
One-shot idempotent deploy script.
SPEC.md
Design contract this implementation satisfies.
html/index.html
Local documentation site (runs via python3 serve.py).
.env
Local-only; holds AI_API_BASE and API_KEY.

13. Teardown

Removes all GCP resources this project created. Vertex AI itself stays enabled.

gcloud beta run domain-mappings delete --domain=dev.aiaggies.net \
  --region=us-central1 --project=rajdphd-prep --quiet

gcloud run services delete ai-proxy \
  --region=us-central1 --project=rajdphd-prep --quiet

gcloud projects remove-iam-policy-binding rajdphd-prep \
  --member="serviceAccount:ai-proxy-sa@rajdphd-prep.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user" --condition=None --quiet

gcloud iam service-accounts delete ai-proxy-sa@rajdphd-prep.iam.gserviceaccount.com \
  --project=rajdphd-prep --quiet