System Architecture Overview
Last updated: 2026-05-21
Production domains: app.luckyplans.xyz (web), api.luckyplans.xyz (gateway), admin.luckyplans.xyz (admin surfaces), v0.api.luckyplans.xyz (legacy API via k3d to host port 9000).
Architecture Diagram
Keycloak (IdP — self-hosted, Helm subchart)
▲ │
│ │ ROPC grant (login) + Admin REST API (registration)
│ ▼
API Gateway (apps/api-gateway) ← Handles auth server-side
│ POST /auth/login → ROPC grant, create Redis session, set cookie
│ POST /auth/register → Admin API user creation, auto-login via ROPC
│ POST /auth/logout → clear Redis session + cookie
│
│ session_id cookie (HttpOnly, Secure, SameSite=Lax)
│
Browser ─── cookie sent automatically on every request ─────────┐
│ │
│ GraphQL (Apollo Client, credentials: 'include') │
▼ │
Next.js Frontend (apps/web) ◄────────────────────────────────────┘
│ Custom /login and /register pages (fetch POST to gateway)
│ middleware.ts: cookie presence check only
│
│ Next.js rewrites (locally), K8s ingress (prod)
│ routes /auth/* + /graphql → Gateway
│
▼
API Gateway
│ SessionGuard: looks up session in Redis, extracts { userId, roles }
│ Refreshes access token server-side if near expiry
│ NestJS + Apollo Server (code-first GraphQL)
│
│ Redis pub/sub — payload carries { userId, roles }, never the raw token
│
▼
Worker Edges (`apps/edge-agent`, external runtime)
▲ Internal edge APIs (`/internal/edges/*`)
│ - register + credential issuance
│ - connectivity heartbeat (`lastSeenAt`, version, targetVersion)
│ - task polling/results/complete/fail
│ - upgrade status reporting
▼
API Gateway + PostgreSQL/Prisma
Service Map
| Service | Purpose | Communication |
|---|---|---|
| Keycloak | Identity Provider — user management, roles (PostgreSQL-backed) | ROPC + Admin API (gateway ↔ Keycloak server-side) |
apps/web | Next.js frontend, App Router, Apollo Client, custom auth pages, blog | GraphQL → API gateway (via Next.js rewrites / K8s ingress) |
apps/api-gateway | GraphQL API, REST auth endpoints, edge control plane (worker registry, release metadata, upgrade campaigns), Redis session management | GraphQL + internal REST APIs |
apps/edge-agent | External edge runtime: onboarding, registration, connectivity heartbeat, task execution, idle-only upgrade loop | Internal REST APIs ↔ gateway |
Architectural Decisions
Services are split by functionality, not by domain:
api-gatewaycurrently owns client-facing domain operations and orchestration logic- edge runtimes (
apps/edge-agent) execute distributed workloads and report state to gateway - new microservices are created only when workload/operations justify isolation (CPU-heavy, cron/background, independent scaling/SLO)
- domain types live in
packages/shared, not in individual services
Auth Flow
- User hits a protected
(app)route → Next.js middleware checks forsession_idcookie - If no cookie → redirect to
/login(custom Next.js page) - User enters credentials → form submits
POST /auth/loginwith{ email, password } - Gateway authenticates with Keycloak via ROPC grant (
grant_type=password) - Gateway creates Redis session, sets opaque
session_idcookie (HttpOnly, Secure, SameSite=Lax) - Frontend redirects to the original protected route
- All subsequent GraphQL requests include the cookie automatically (browser handles this)
- Gateway
SessionGuardlooks up session in Redis, refreshes token if near expiry - Extracts
{ userId, email, name, roles }into GQL context - Redis microservice messages carry this identity payload — never the raw token
Registration: custom /register page → POST /auth/register → gateway creates user via Keycloak Admin REST API (service account with manage-users role), then auto-logs in via ROPC.
Data Flow (Request Lifecycle)
- Frontend sends GraphQL query/mutation via Apollo Client (cookie sent automatically)
- API Gateway
SessionGuardlooks up session in Redis, validates, and populatesreq.user - Resolver executes gateway service logic (and edge orchestration when needed)
- Gateway persists/reads domain state via Prisma/PostgreSQL
- Gateway returns GraphQL response to frontend
Edge Runtime Lifecycle (Current)
- Edge starts and loads local config; if missing and interactive, runs onboarding wizard.
- Wizard collects display name, server URL, registration token; edge generates
edge-<human-name>-<shortid>. - Gateway registers/upserts worker by
deviceNumberand issues credential. - Edge sends connectivity heartbeat; gateway updates
lastSeenAt,version,targetVersion,upgradeStatus. - Edge polls tasks and reports heartbeat/results/terminal states.
- If
targetVersiondiffers and edge is idle, upgrade flow runs and status transitions are reported.
Shared Packages
| Package | Purpose |
|---|---|
packages/shared | Entity interfaces (User, AuthUser), DTOs, message pattern enums, ServiceResponse<T>, utility functions (getEnvVar, getRedisConfig, generateId) |
packages/config | Shared ESLint preset, TypeScript configs for NestJS and Next.js |
Infrastructure
- Build: Turborepo for monorepo orchestration, pnpm workspaces
- Local dev:
docker compose up -d(Redis, PostgreSQL, Keycloak), thenpnpm dev - Containers: Multi-stage Docker builds (Alpine, non-root, turbo prune)
- CI/CD: GitHub Actions → Docker build → ArgoCD sync
- Deployment: Helm charts on Kubernetes (k3s locally, any K8s in production)
- Auth: Keycloak (self-hosted, PostgreSQL-backed), gateway-managed sessions with Redis store
- Docs: Nextra v4 integrated at
/docswithinapps/web - Inter-service/runtime: Redis session/cache + internal edge HTTP APIs for distributed worker orchestration
See deep dives:
Current State and Known Limitations
- Worker upgrade execution is policy-driven — edge upgrades are idle-only and rely on release metadata integrity (URL/checksum/signature).
- Connectivity is heartbeat-based — stale
lastSeenAtreflects liveness issues but not root cause. - Keycloak + PostgreSQL required — local development needs
docker compose up -dbeforepnpm dev. - Gateway stateful — session store depends on Redis; Redis data loss invalidates all sessions.
- Blog pages are static placeholders — no CMS backend yet
- Portfolio image uploads not supported —
imageUrlfields accept external URLs only, no file upload - No portfolio search/discovery — public profiles are accessed directly via
/u/[userId], no directory or search