Kubernetes deployment
The Helm chart under helm/teslasync is the canonical way to run TeslaSync on Kubernetes. It deploys the same architecture you'd get from Docker Compose — API, three worker binaries, data plane, observability — but with Kubernetes-native primitives: Deployments, Services, ConfigMaps, Secrets, optional Ingress or IngressRoute, optional HorizontalPodAutoscalers.
This page is the deployment guide. For the runtime architecture itself, see Architecture. For every configuration knob, see Configuration.
When Kubernetes is the right choice
Choose Kubernetes when:
- You want multiple API replicas behind a load balancer
- You want managed certificates, ingress rate limiting, network policies, pod security
- You already operate a cluster and adding one more workload is easier than running a separate host
- You want to scale horizontally as your fleet grows past the point where one host is comfortable
If none of the above apply, Docker Compose is simpler and faster to operate.
The 30-second walkthrough
# from the repo root
helm lint helm/teslasync
helm template teslasync helm/teslasync -f values.yaml
helm upgrade --install teslasync helm/teslasync \
-n teslasync --create-namespace \
-f values.yaml
kubectl rollout status deployment/teslasync-api -n teslasyncThat gets you a running deployment. The interesting work is in your values.yaml — what's published below is the contract for that file.
Core values
The minimum useful values.yaml for a same-origin deployment behind a forward-auth proxy:
image:
repository: ghcr.io/ev-dev-labs/teslasync-api
# tag defaults to chart appVersion
web:
enabled: true
service:
type: ClusterIP
service:
type: ClusterIP
config:
apiEndpoint: "http://teslasync-api.teslasync.svc.cluster.local:8080"
browserApiBase: ""
webEndpoint: "https://teslasync.example.com"
forwardAuthHeader: "X-Authentik-Username"A few things worth understanding before tuning the rest.
apiEndpoint vs browserApiBase
This trips people up.
apiEndpointis the URL Nginx in the web container uses when it proxies/api/*to the API service. It's always an internal cluster DNS name. Alwayshttp://(TLS is at the ingress, not internal).browserApiBaseis the URL the browser sees in JavaScript. For same-origin deployments, leave it empty — the SPA hits/api/...on its own origin and Nginx forwards it internally. For separate-origin deployments (API on its own subdomain), set it to the API's URL and configure CORS viaCORS_ORIGINS.
Most installs are same-origin. Empty browserApiBase is the default to choose first.
Services should be ClusterIP
API and web services are ClusterIP unless you have a specific reason to expose them at L4. The ingress (or IngressRoute) handles external traffic. Data services (postgres, redis, mosquitto) are always ClusterIP.
forwardAuthHeader must match your proxy
| Provider | Header value |
|---|---|
| Authentik | X-Authentik-Username |
| Authelia | Remote-User |
| oauth2-proxy | X-Auth-Request-User |
| Keycloak / custom proxy | X-Forwarded-User |
A wrong header value produces 401 on every request once the user is "logged in". The forward-auth proxy strips and re-injects the header, so spoofing from outside the cluster is blocked at the ingress.
Optional components
The chart can deploy or skip each of these. Toggle in values.yaml:
| Component | Helm key | Purpose |
|---|---|---|
| Vehicle Command Proxy | commandProxy.enabled / commandProxy.external.url | Signs commands for vehicles that require it |
| Fleet Telemetry server | fleetTelemetry.enabled | Tesla streaming endpoint, low-latency live data |
| Jaeger | jaeger.enabled | OpenTelemetry trace UI |
| Ollama | ollama.enabled | Local LLM inference for Helix AI |
| MongoDB | mongodb.enabled | Optional raw signal capture for debugging |
For data services (PostgreSQL, Redis, Mosquitto), the chart can either deploy embedded versions or point at external instances you operate separately. Production deployments usually point at managed Postgres (postgresql.enabled: false, then provide postgresql.external.*) and managed Redis. The chart's embedded data services are fine for small to medium installs.
Ingress
Two patterns, depending on which controller you use.
Traefik with IngressRoute
The recommended pattern. Two routes — the /.well-known path public for Tesla key verification, everything else behind auth.
ingressRoute:
enabled: true
entryPoints:
- websecure
routes:
# Tesla public-key route MUST bypass auth
- kind: Rule
match: "Host(`teslasync.example.com`) && PathPrefix(`/.well-known`)"
priority: 100
middlewares:
- name: default-headers
namespace: traefik
services:
- name: teslasync-web
port: 80
- kind: Rule
match: "Host(`teslasync.example.com`)"
priority: 10
middlewares:
- name: authentik-auth
namespace: authentik
- name: default-headers
namespace: traefik
services:
- name: teslasync-web
port: 80
tls:
enabled: true
secretName: teslasync-tlsThe priority matters. The well-known route must be evaluated first so it doesn't fall through to the auth route.
You do not need a public PathPrefix('/api') route directly to teslasync-api — web/Nginx handles /api internally.
Standard Ingress (nginx-ingress, others)
ingress:
enabled: true
className: nginx
hosts:
- host: teslasync.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: teslasync-tls
hosts:
- teslasync.example.com
annotations:
nginx.ingress.kubernetes.io/auth-url: "https://authentik.example.com/outpost.goauthentik.io/auth/nginx"
nginx.ingress.kubernetes.io/auth-snippet: |
proxy_set_header X-Original-URL $scheme://$http_host$request_uri;Verify the well-known path is reachable without going through the auth challenge — most providers' nginx-ingress snippets let you exempt specific paths.
Auth — ForwardAuth
The chart doesn't deploy your auth proxy; it integrates with the one you operate. The two pieces that matter:
- The reverse proxy / ingress middleware authenticates the user and injects a header (
X-Authentik-Username,Remote-User, etc.) - TeslaSync reads
FORWARD_AUTH_HEADERfrom its config and uses that header to resolve / create the User record
A correctly-configured chain produces requests where the API sees the authenticated user from the start of every request, without needing to validate session cookies or tokens itself.
If you want to delay rolling out auth, you can run with a fixed header in dev (X-Forwarded-User: admin) — but never expose such a deployment to the internet.
Helix AI configuration
Helix is off-by-default per feature, so the chart can install with no AI configuration and Helix is simply invisible. To enable the infrastructure, inject provider credentials via secrets and set the env vars on the API + workers:
secrets:
openai:
apiKey: <secretRef>
anthropic:
apiKey: <secretRef>
azureOpenAI:
endpoint: <secretRef>
apiKey: <secretRef>
deployment: <secretRef>
config:
ai:
provider: "ollama" # or openai / azure / anthropic
dailyBudgetUsd: 5
rateLimitPerMin: 60
redactionEnabled: trueDon't put cloud API keys in values.yaml directly — reference cluster Secrets. Most teams maintain a separate secrets.yaml (encrypted with sops, sealed-secrets, or your secrets management tool of choice) and pass both files to Helm.
Full Helix env reference: Configuration → Helix AI settings.
Storage
PostgreSQL with TimescaleDB and pgvector is non-negotiable — the platform's first migration installs both extensions. The chart's embedded postgres uses the timescale/timescaledb-ha:pg17 image which has them preinstalled. If you point at an external Postgres, ensure both extensions are available:
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE EXTENSION IF NOT EXISTS vector;For production-grade Postgres, run the database with persistent volumes backed by your fastest available storage class. Telemetry writes are bursty; a slow disk turns into back-pressure that propagates to the ingest worker.
Scaling
Three things scale independently:
| Layer | How to scale | Notes |
|---|---|---|
| API | api.replicas or HPA on CPU | Set LIVE_SIGNAL_STORE_MODE=hybrid so L2 + Pub/Sub fans state out |
| Workers | Per-worker replicas (notification, export, automation) | Queues are partitioned; multiple replicas drain in parallel |
| Web | web.replicas or HPA | Web is stateless; scale freely |
| Postgres | Vertical (more CPU / RAM / faster disk) — Timescale handles single-instance well | Read replicas possible but not required for typical fleet sizes |
| Redis | Vertical for typical loads; cluster mode if you have unusual scale | Pub/Sub is the bottleneck on extreme fanout |
For multi-replica API, the L2 Redis cache + Pub/Sub fanout is what makes it work. Without LIVE_SIGNAL_STORE_MODE=hybrid (the default in Helm), each replica has its own L1 view and they drift.
Verify after install
kubectl get pods -n teslasync
kubectl rollout status deployment/teslasync-api -n teslasync
kubectl rollout status deployment/teslasync-web -n teslasync
kubectl logs deployment/teslasync-api -n teslasync | grep -i migration
kubectl exec deployment/teslasync-api -n teslasync -- wget -qO- localhost:8080/healthz
kubectl exec deployment/teslasync-api -n teslasync -- wget -qO- localhost:8080/readyzA healthy install logs migrations applied, listening on :8080, and /healthz + /readyz both return 200.
Upgrading
helm upgrade teslasync helm/teslasync -n teslasync -f values.yaml
kubectl rollout status deployment/teslasync-api -n teslasyncThe platform never drops data on upgrade unless a release notes call out a destructive migration. Migrations run automatically on the next API pod startup; rolling restart picks up the new version.
For zero-downtime rolling upgrades:
- Run with
replicas: 2+for the API - Pod Disruption Budget set to require at least 1 available
- Migrations should be backward-compatible within a single release (the platform follows this rule); if a release breaks the contract it'll be flagged
Production checklist
Before pointing real users at the deployment:
- [ ] HTTPS enabled at the ingress with valid certificates
- [ ]
/.well-knownroute exempt from auth and reachable publicly - [ ] All app routes behind forward-auth
- [ ]
forwardAuthHeadermatches the proxy - [ ] API and web Services are
ClusterIP - [ ]
browserApiBase: ""for same-origin; otherwise CORS configured - [ ]
ENCRYPTION_KEYset in a cluster Secret (never plaintext in values.yaml) - [ ] Tesla OAuth credentials in cluster Secrets
- [ ] Backups configured and at least one restore drill rehearsed
- [ ] Metrics scraped by an internal Prometheus; not exposed publicly without auth
- [ ] If Helix AI is enabled with a cloud provider,
AI_DAILY_BUDGET_USDset to a sane value - [ ] If you have signed-command vehicles,
commandProxy.enabled: true(orcommandProxy.external.url) - [ ] Resource requests + limits set on API and worker pods
- [ ] Pod Disruption Budgets in place for any deployment running ≥2 replicas
- [ ] NetworkPolicies restricting east-west traffic to what's needed
When something goes wrong
The Troubleshooting playbook applies — the failure modes are the same; only the commands differ:
kubectl logs deployment/teslasync-api -n teslasync --tail=200 | grep -i error
kubectl describe pod <pod-name> -n teslasync
kubectl exec -it deployment/teslasync-api -n teslasync -- /bin/shThe X-Request-Id header from a failed request is the most useful breadcrumb — every log line for that request will carry it.
Related
- Docker Compose — the simpler deployment when one host is enough
- Configuration — every env var, every default
- Architecture — the runtime view of how requests flow
- Helix AI — the off-by-default AI layer
- Backup & Restore — recovery procedures
- Troubleshooting — the symptom-driven playbook