redesign fully scaffolded and web login works

This commit is contained in:
2026-03-17 20:10:47 -04:00
parent b9cc397e05
commit f6bd22a8ef
143 changed files with 17317 additions and 693 deletions

View File

@@ -91,6 +91,10 @@ Containers self-manage their lifecycle using the lifecycle sidecar (see `../life
- OpenAI GPT
- Google Gemini
- OpenRouter (one key for 300+ models)
- Ollama (for embeddings): https://ollama.com/download
- Redis (for session/hot storage)
- Qdrant (for RAG vector search)
- Kafka + Flink + Iceberg (for durable storage)
### Development
@@ -119,7 +123,20 @@ DEFAULT_MODEL_PROVIDER=anthropic
DEFAULT_MODEL=claude-3-5-sonnet-20241022
```
4. Run development server:
4. Start Ollama and pull embedding model:
```bash
# Install Ollama (one-time): https://ollama.com/download
# Or with Docker: docker run -d -p 11434:11434 ollama/ollama
# Pull the all-minilm embedding model (90MB, CPU-friendly)
ollama pull all-minilm
# Alternative models:
# ollama pull nomic-embed-text # 8K context length
# ollama pull mxbai-embed-large # Higher accuracy, slower
```
5. Run development server:
```bash
npm run dev
```
@@ -200,11 +217,143 @@ ws.send(JSON.stringify({
**`GET /health`**
- Returns server health status
## Ollama Deployment Options
The gateway requires Ollama for embedding generation in RAG queries. You have two deployment options:
### Option 1: Ollama in Gateway Container (Recommended for simplicity)
Install Ollama directly in the gateway container. This keeps all dependencies local and simplifies networking.
**Dockerfile additions:**
```dockerfile
FROM node:22-slim
# Install Ollama
RUN curl -fsSL https://ollama.com/install.sh | sh
# Pull embedding model at build time
RUN ollama serve & \
sleep 5 && \
ollama pull all-minilm && \
pkill ollama
# ... rest of your gateway Dockerfile
```
**Start script (entrypoint.sh):**
```bash
#!/bin/bash
# Start Ollama in background
ollama serve &
# Start gateway
node dist/main.js
```
**Pros:**
- Simple networking (localhost:11434)
- No extra K8s resources
- Self-contained deployment
**Cons:**
- Larger container image (~200MB extra)
- CPU/memory shared with gateway process
**Resource requirements:**
- Add +200MB memory
- Add +0.2 CPU cores for embedding inference
### Option 2: Ollama as Separate Pod/Sidecar
Deploy Ollama as a separate container in the same pod (sidecar) or as its own deployment.
**K8s Deployment (sidecar pattern):**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: gateway
spec:
template:
spec:
containers:
- name: gateway
image: ghcr.io/dexorder/gateway:latest
env:
- name: OLLAMA_URL
value: http://localhost:11434
- name: ollama
image: ollama/ollama:latest
command: ["/bin/sh", "-c"]
args:
- |
ollama serve &
sleep 5
ollama pull all-minilm
wait
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
```
**K8s Deployment (separate service):**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
spec:
replicas: 1
template:
spec:
containers:
- name: ollama
image: ollama/ollama:latest
# ... same as above
---
apiVersion: v1
kind: Service
metadata:
name: ollama
spec:
selector:
app: ollama
ports:
- port: 11434
```
Gateway `.env`:
```bash
OLLAMA_URL=http://ollama:11434
```
**Pros:**
- Isolated resource limits
- Can scale separately
- Easier to monitor/debug
**Cons:**
- More K8s resources
- Network hop (minimal latency)
- More complex deployment
### Recommendation
For most deployments: **Use Option 1 (in-container)** for simplicity, unless you need to:
- Share Ollama across multiple services
- Scale embedding inference independently
- Run Ollama on GPU nodes (gateway on CPU nodes)
## TODO
- [ ] Implement JWT verification with JWKS
- [ ] Implement MCP HTTP/SSE transport
- [ ] Add Redis for session persistence
- [ ] Add rate limiting per user license
- [ ] Add message usage tracking
- [ ] Add streaming responses for WebSocket