Deploy an Agent
Quick Start (30 seconds)
# Install CLI
npm install -g @moltghost/cli
# Deploy your first agent
moltghost deploy my-agent \
--model llama3.1-8b-q4 \
--gpu l40s \
--memory 24gb
# Get instant endpoint
echo "โ
Agent ready: https://$(moltghost status my-agent --url)/v1/chat"
๐ Deployment complete in 45s
๐ฑ Endpoint: https://abc123.agent.moltghost.io
๐ฐ Cost: $0.0017/min (A100 paused)
Complete Deployment Flow
graph TD
A[CLI/UI/API Deploy] --> B[Validate Config]
B --> C[Provision Pod: 10-30s]
C --> D[Container Start: 5s]
D --> E[Runtime Init: 15s]
E --> F[Ollama + Model Load: 30-180s]
F --> G[Health Check]
G -->|โ
| H[Endpoint Active]
G -->|โ| I[Auto-Retry]
style H fill:#90EE90
End-to-End Timeline:
| Model Size | Provision | Model Load | Total |
|---|---|---|---|
| 8B | 20s | 25s | 45s |
| 70B | 25s | 90s | 2m15s |
| 405B | 40s | 300s | 5m40s |
Deployment Command Reference
moltghost deploy AGENT_NAME [options]
# Core Options
--model llama3.1-70b-q4 # LLM (20+ supported)
--gpu l40s \| a100 \| h100 # Compute tier
--memory 24gb \| 80gb # System RAM
--storage 50gb # Persistent volume
# Advanced
--region ap-southeast-1 # Jakarta (low latency)
--replicas 3 # HA deployment
--auto-pause 15m # Cost optimization
--skills ./my-skills/ # Private tools
--env KEY=secret # Environment vars
# Examples
moltghost deploy sales-agent --model qwen2.5-72b --gpu h100 --region jakarta
moltghost deploy dev-chat --model phi3-mini --cpu --memory 16gb --free-tier
Step-by-Step Process
1. Configuration (Instant)
# Define agent spec
cat > agent.yaml
name: sales-assistant
model: llama3.1-70b-q4
resources:
gpu: a100
memory: 80gb
storage: 100gb
skills: ["crm", "slack"]
env:
CRM_API_KEY: secret
2. Pod Provisioning (10-30s)
โ Allocating GPU (A100-80GB)
โ Mounting 80GB RAM
โ Creating 100GB NVMe volume
โ Internal networking (WireGuard)
Pod abc123 ready โ
3. Runtime Initialization (15s)
โ OpenClaw framework v1.2.3
โ Ollama server starting
โ Skills loaded: crm_query, slack_notify
Runtime healthy โ
4. Model Loading (30-180s)
โฌ๏ธ Downloading llama3.1-70b-q4... 42GB / 42GB
๐ Loading to GPU memory (Q4_K_M 38GB)
โก Warming inference engine
Model ready โ 1.2s/token
5. Endpoint Activation (Instant)
โ
Agent fully deployed!
๐ฑ https://abc123.agent.moltghost.io
๐ OpenAI compatible: /v1/chat, /v1/completions
๐ฐ Current rate: $0.0417/hour
Deployment States
| State | Duration | Endpoint | Billing | Action |
|---|---|---|---|---|
| Pending | 0-10s | 404 | No | Provisioning |
| Provisioning | 10-60s | 202 Accepted | Yes | Pod setup |
| Initializing | 1-3m | 503 | Yes | Runtime + model |
| Healthy | โ | 200 OK | Yes | โ Ready |
| Paused | โ | 503 | Storage only | moltghost pause |
Real-time Status:
moltghost status my-agent --watch
# 14:32:15 Initializing (87%) Model loading 72%
# 14:32:45 โ
Healthy Endpoint active Uptime 0m
Production Deployment Patterns
High Availability
moltghost deploy prod-agent \
--replicas 3 \
--region ap-southeast-1 \
--auto-scale cpu>80 \
--healthcheck /v1/health
Cost-Optimized Development
moltghost deploy dev-agent \
--model phi3-mini-q4 \
--cpu \
--memory 16gb \
--auto-pause 10m \
--backup daily
Enterprise Multi-Region
moltghost deploy global-agent \
--model llama3.1-405b \
--gpu h100 \
--replicas 2 \
--region ap-southeast-1 \
--region us-east-1 \
--backup cross-region
Troubleshooting
| Issue | Symptom | Solution |
|---|---|---|
| Slow Model Load | >5min | moltghost logs --model-load |
| Pod Provision Fail | "No capacity" | --region alternate or --gpu-priority |
| OOM Error | "CUDA out of memory" | --memory +16gb or --quantize q4 |
| Skill Not Found | 404 on tools | skills list --verify |
Summary
Deploy โ Ready in <3 minutes with production-grade infrastructure.
โ
One-command deployment (CLI/UI/API)
โ
30s-5m total time (model dependent)
โ
Automatic HTTPS endpoints
โ
Cost controls built-in
โ
Production patterns (HA, multi-region)
From zero to intelligent agent in one command.
Next: Manage Agents โ Scale, pause, update
Test Immediately:
curl -X POST https://YOUR-AGENT.agent.moltghost.io/v1/chat \
-d '{"messages":[{"role":"user","content":"Hello!"}]}'