System Architecture Overview

Last updated: 2026-05-21

Production domains: app.luckyplans.xyz (web), api.luckyplans.xyz (gateway), admin.luckyplans.xyz (admin surfaces), v0.api.luckyplans.xyz (legacy API via k3d to host port 9000).

Architecture Diagram

Keycloak (IdP — self-hosted, Helm subchart)
  ▲   │
  │   │ ROPC grant (login) + Admin REST API (registration)
  │   ▼
API Gateway (apps/api-gateway)          ← Handles auth server-side
  │  POST /auth/login    → ROPC grant, create Redis session, set cookie
  │  POST /auth/register → Admin API user creation, auto-login via ROPC
  │  POST /auth/logout   → clear Redis session + cookie

  │  session_id cookie (HttpOnly, Secure, SameSite=Lax)

Browser ─── cookie sent automatically on every request ─────────┐
  │                                                               │
  │ GraphQL (Apollo Client, credentials: 'include')               │
  ▼                                                               │
Next.js Frontend (apps/web)  ◄────────────────────────────────────┘
  │  Custom /login and /register pages (fetch POST to gateway)
  │  middleware.ts: cookie presence check only

  │ Next.js rewrites (locally), K8s ingress (prod)
  │ routes /auth/* + /graphql → Gateway


API Gateway
  │  SessionGuard: looks up session in Redis, extracts { userId, roles }
  │  Refreshes access token server-side if near expiry
  │  NestJS + Apollo Server (code-first GraphQL)

  │ Redis pub/sub — payload carries { userId, roles }, never the raw token


Worker Edges (`apps/edge-agent`, external runtime)
  ▲  Internal edge APIs (`/internal/edges/*`)
  │  - register + credential issuance
  │  - connectivity heartbeat (`lastSeenAt`, version, targetVersion)
  │  - task polling/results/complete/fail
  │  - upgrade status reporting

API Gateway + PostgreSQL/Prisma

Service Map

ServicePurposeCommunication
KeycloakIdentity Provider — user management, roles (PostgreSQL-backed)ROPC + Admin API (gateway ↔ Keycloak server-side)
apps/webNext.js frontend, App Router, Apollo Client, custom auth pages, blogGraphQL → API gateway (via Next.js rewrites / K8s ingress)
apps/api-gatewayGraphQL API, REST auth endpoints, edge control plane (worker registry, release metadata, upgrade campaigns), Redis session managementGraphQL + internal REST APIs
apps/edge-agentExternal edge runtime: onboarding, registration, connectivity heartbeat, task execution, idle-only upgrade loopInternal REST APIs ↔ gateway

Architectural Decisions

Services are split by functionality, not by domain:

  • api-gateway currently owns client-facing domain operations and orchestration logic
  • edge runtimes (apps/edge-agent) execute distributed workloads and report state to gateway
  • new microservices are created only when workload/operations justify isolation (CPU-heavy, cron/background, independent scaling/SLO)
  • domain types live in packages/shared, not in individual services

Auth Flow

  1. User hits a protected (app) route → Next.js middleware checks for session_id cookie
  2. If no cookie → redirect to /login (custom Next.js page)
  3. User enters credentials → form submits POST /auth/login with { email, password }
  4. Gateway authenticates with Keycloak via ROPC grant (grant_type=password)
  5. Gateway creates Redis session, sets opaque session_id cookie (HttpOnly, Secure, SameSite=Lax)
  6. Frontend redirects to the original protected route
  7. All subsequent GraphQL requests include the cookie automatically (browser handles this)
  8. Gateway SessionGuard looks up session in Redis, refreshes token if near expiry
  9. Extracts { userId, email, name, roles } into GQL context
  10. Redis microservice messages carry this identity payload — never the raw token

Registration: custom /register page → POST /auth/register → gateway creates user via Keycloak Admin REST API (service account with manage-users role), then auto-logs in via ROPC.

Data Flow (Request Lifecycle)

  1. Frontend sends GraphQL query/mutation via Apollo Client (cookie sent automatically)
  2. API Gateway SessionGuard looks up session in Redis, validates, and populates req.user
  3. Resolver executes gateway service logic (and edge orchestration when needed)
  4. Gateway persists/reads domain state via Prisma/PostgreSQL
  5. Gateway returns GraphQL response to frontend

Edge Runtime Lifecycle (Current)

  1. Edge starts and loads local config; if missing and interactive, runs onboarding wizard.
  2. Wizard collects display name, server URL, registration token; edge generates edge-<human-name>-<shortid>.
  3. Gateway registers/upserts worker by deviceNumber and issues credential.
  4. Edge sends connectivity heartbeat; gateway updates lastSeenAt, version, targetVersion, upgradeStatus.
  5. Edge polls tasks and reports heartbeat/results/terminal states.
  6. If targetVersion differs and edge is idle, upgrade flow runs and status transitions are reported.

Shared Packages

PackagePurpose
packages/sharedEntity interfaces (User, AuthUser), DTOs, message pattern enums, ServiceResponse<T>, utility functions (getEnvVar, getRedisConfig, generateId)
packages/configShared ESLint preset, TypeScript configs for NestJS and Next.js

Infrastructure

  • Build: Turborepo for monorepo orchestration, pnpm workspaces
  • Local dev: docker compose up -d (Redis, PostgreSQL, Keycloak), then pnpm dev
  • Containers: Multi-stage Docker builds (Alpine, non-root, turbo prune)
  • CI/CD: GitHub Actions → Docker build → ArgoCD sync
  • Deployment: Helm charts on Kubernetes (k3s locally, any K8s in production)
  • Auth: Keycloak (self-hosted, PostgreSQL-backed), gateway-managed sessions with Redis store
  • Docs: Nextra v4 integrated at /docs within apps/web
  • Inter-service/runtime: Redis session/cache + internal edge HTTP APIs for distributed worker orchestration

See deep dives:

Current State and Known Limitations

  • Worker upgrade execution is policy-driven — edge upgrades are idle-only and rely on release metadata integrity (URL/checksum/signature).
  • Connectivity is heartbeat-based — stale lastSeenAt reflects liveness issues but not root cause.
  • Keycloak + PostgreSQL required — local development needs docker compose up -d before pnpm dev.
  • Gateway stateful — session store depends on Redis; Redis data loss invalidates all sessions.
  • Blog pages are static placeholders — no CMS backend yet
  • Portfolio image uploads not supportedimageUrl fields accept external URLs only, no file upload
  • No portfolio search/discovery — public profiles are accessed directly via /u/[userId], no directory or search