Open Source Agents - Automagik Suite Documentation

Overview

Run powerful AI coding agents completely local - no API keys, no cloud services, no data leaving your machine. Perfect for privacy-sensitive work, learning, or unlimited free usage.

Privacy First: All code stays on your machine. No external API calls.

Supported Models

OpenCode

Open-source coding model

By: Open-source community
Size: 15B parameters
Context: 16K tokens
Requirements: 16GB+ RAM, GPU recommended
License: Apache 2.0

Best for: General coding, privacy-sensitive work

Qwen Code

Alibaba’s open-source coder

By: Alibaba DAMO Academy
Size: 7B, 14B, 32B parameters
Context: 32K tokens
Requirements: 8GB-64GB RAM depending on size
License: Apache 2.0

Best for: Cost-conscious development, competitive with commercial

Quick Start with Ollama

Ollama makes running local models easy:

Install Ollama

# macOS
brew install ollama

# Linux
curl https://ollama.ai/install.sh | sh

# Windows
# Download from https://ollama.ai

Start Ollama

ollama serve

Runs on http://localhost:11434

Pull Models

# OpenCode
ollama pull opencode

# Qwen Code (choose size)
ollama pull qwen2.5-coder:7b    # Small, fast
ollama pull qwen2.5-coder:14b   # Balanced
ollama pull qwen2.5-coder:32b   # Best quality

Configure Forge

Edit .forge/config.json:

{
  "llms": {
    "opencode": {
      "type": "ollama",
      "model": "opencode",
      "endpoint": "http://localhost:11434"
    },
    "qwen": {
      "type": "ollama",
      "model": "qwen2.5-coder:32b",
      "endpoint": "http://localhost:11434"
    }
  }
}

Test

forge task create \
  --title "Test local model" \
  --description "Print 'Hello from OpenCode!'" \
  --llm opencode

Hardware Requirements

Minimum Specs

Model	RAM	GPU	Storage	Speed
Qwen 7B	8GB	Optional	5GB	Good
Qwen 14B	16GB	Recommended	10GB	Better
OpenCode	16GB	Recommended	12GB	Good
Qwen 32B	32GB	Required	25GB	Best

Recommended Setup

For best experience:

CPU:  Modern multi-core (8+ cores)
RAM:  32GB+
GPU:  NVIDIA RTX 3060+ (12GB VRAM)
      or Apple Silicon M1/M2/M3

Storage: SSD with 50GB+ free space

Apple Silicon users: Metal acceleration makes M1/M2/M3 excellent for local models!

Configuration

Basic Ollama Setup

{
  "llms": {
    "opencode": {
      "type": "ollama",
      "model": "opencode",
      "endpoint": "http://localhost:11434"
    }
  }
}

Advanced Configuration

{
  "llms": {
    "qwen": {
      "type": "ollama",
      "model": "qwen2.5-coder:32b",
      "endpoint": "http://localhost:11434",
      "options": {
        "temperature": 0.7,
        "num_ctx": 32768,      // Context window
        "num_predict": 2048,   // Max output tokens
        "top_p": 0.9,
        "top_k": 40,
        "repeat_penalty": 1.1
      },
      "timeout": 120000  // 2 minutes
    }
  }
}

GPU Acceleration

Enable GPU support for faster inference:

{
  "llms": {
    "qwen": {
      "type": "ollama",
      "model": "qwen2.5-coder:32b",
      "options": {
        "num_gpu": 1,         // Number of GPUs
        "main_gpu": 0,        // Primary GPU index
        "low_vram": false     // Enable for under 8GB VRAM
      }
    }
  }
}

Strengths of Local Models

Complete Privacy

No Data Leakage

Code never leaves your machine
No API calls to external services
No telemetry or tracking
Perfect for sensitive codebases

Compliance-Ready

GDPR compliant by design
No third-party data sharing
Full audit trail
Meets enterprise security requirements

No Usage Limits

# Unlimited usage!
forge task create "Task 1" --llm qwen
forge task create "Task 2" --llm qwen
forge task create "Task 3" --llm qwen
# ... as many as you want

# No rate limits
# No token costs
# No monthly fees

No Internet Required

Work anywhere:

✈️ On airplanes
🏔️ Remote locations
🔌 During outages
🔒 Air-gapped environments

Limitations

Lower Quality

Local models are less capable than cloud models:

Task Type	Local	Claude	GPT-4
Simple fixes	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Architecture	⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Bug fixing	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Testing	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐

Slower

# Speed comparison: "Add authentication"
Gemini Flash:  2m 15s  ⚡
Claude Sonnet: 5m 22s
GPT-4 Turbo:   6m 01s
Qwen 32B:      12m 30s  (local)
OpenCode:      15m 45s  (local)

Hardware Intensive

Requires powerful machine
GPU strongly recommended
High RAM usage
Slower on CPU-only

Best Use Cases

Privacy-Sensitive Work

# Medical records processing
forge task create \
  --title "Process patient data" \
  --files "data/patients/*.csv" \
  --llm qwen  # Stays local!

Learning & Experimentation

# Unlimited free usage for learning
forge task create "Try approach 1" --llm opencode
forge task create "Try approach 2" --llm opencode
forge task create "Try approach 3" --llm opencode
# ... no cost!

Air-Gapped Environments

# Government, military, high-security environments
# No internet connection required
forge task create "Classified work" --llm qwen

Cost Reduction

# After initial hardware investment
# Zero ongoing costs
forge task create "Feature 1" --llm qwen  # $0
forge task create "Feature 2" --llm qwen  # $0
forge task create "Feature 3" --llm qwen  # $0

Model Comparison

OpenCode vs Qwen

Feature	OpenCode	Qwen 7B	Qwen 14B	Qwen 32B
Quality	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐
Speed	⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐
RAM	16GB	8GB	16GB	32GB
Best for	General	Quick tasks	Balanced	Quality

Local vs Cloud

Feature	Local (Qwen 32B)	Claude Sonnet	Gemini Pro
Privacy	⭐⭐⭐⭐⭐	⭐⭐	⭐⭐
Quality	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Speed	⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Cost	$$$ hardware, $0 usage	$0 hardware, $$$ usage	$0 hardware, $$ usage

Performance Optimization

Use GPU

# Check GPU usage
nvidia-smi

# Or on Mac
system_profiler SPDisplaysDataType

Adjust Context Window

{
  "llms": {
    "qwen": {
      "options": {
        "num_ctx": 16384  // Reduce for speed
      }
    }
  }
}

Use Smaller Models for Simple Tasks

# Simple task → Qwen 7B (fast)
forge task create "Add comment" --llm qwen-7b

# Complex task → Qwen 32B (quality)
forge task create "Refactor architecture" --llm qwen-32b

Troubleshooting

Ollama not running

Error: “Connection refused to localhost:11434”Solution:

# Start Ollama
ollama serve

# Or as background service (Linux)
systemctl start ollama

# macOS (via brew services)
brew services start ollama

Out of memory

Error: “Failed to allocate memory”Solutions:

Use smaller model (7B instead of 32B)
Close other applications

Enable low VRAM mode:

{
  "llms": {
    "qwen": {
      "options": {
        "low_vram": true
      }
    }
  }
}

Upgrade RAM

Very slow inference

Issue: Model taking foreverSolutions:

Enable GPU acceleration
Use smaller model
Reduce context window
Close background apps
Check CPU usage (should be high)

Model not found

Error: “Model ‘qwen2.5-coder:32b’ not found”Solution:

# Pull the model
ollama pull qwen2.5-coder:32b

# List available models
ollama list

# Check model size before pulling
ollama show qwen2.5-coder:32b

Cost Analysis

Hardware Investment

One-time costs:
├── Mid-range GPU (RTX 4060): $300
├── Additional RAM (32GB): $100
└── SSD storage (1TB): $80
────────────────────────────
Total: ~$480

Break-even vs Claude:
$480 / $50/month = ~10 months of usage

Ongoing Costs

Electricity:
├── GPU power: ~200W
├── Running 8hrs/day
└── Cost: ~$5-10/month

vs Cloud APIs:
├── Claude: $50-200/month typical usage
├── GPT-4: $80-300/month
└── Gemini: $0-50/month (free tier)

Heavy users (>$100/month on APIs) break even quickly with local setup!

Best Practices

Use for Sensitive Work

# Proprietary code
forge task create \
  --title "Refactor proprietary algorithm" \
  --llm qwen  # Stays local

Start Small

Begin with Qwen 7B:

ollama pull qwen2.5-coder:7b

Upgrade to 32B if needed

Monitor Resources

# Check resource usage
htop
nvidia-smi  # GPU

# Adjust if needed

Combine with Cloud

# Quick iteration: local
forge task create "Try A" --llm qwen

# Final polish: cloud
forge task fork 1 --llm claude

Best of both worlds!

Hybrid Strategy

Combine local and cloud models:

Strategy 1: Privacy Tiers

# Sensitive code → local
forge task create \
  --title "Process customer data" \
  --llm qwen

# Public code → cloud (faster, better)
forge task create \
  --title "Update README" \
  --llm gemini

Strategy 2: Cost Optimization

# Experimentation → local (free)
forge task create "Try approach A" --llm qwen
forge task create "Try approach B" --llm qwen
forge task create "Try approach C" --llm qwen

# Production → cloud (quality)
forge task fork <winner> --llm claude

Strategy 3: Network-Aware

# Offline → local
if ping -c 1 8.8.8.8 >/dev/null 2>&1; then
  LLM=gemini  # Online, use fast cloud
else
  LLM=qwen    # Offline, use local
fi

forge task create "Task" --llm $LLM

Real-World Example

Setup for Privacy-First Development

# 1. Install Ollama
brew install ollama

# 2. Pull Qwen 32B (best quality)
ollama pull qwen2.5-coder:32b

# 3. Configure Forge
cat > .forge/config.json <<EOF
{
  "llms": {
    "qwen": {
      "type": "ollama",
      "model": "qwen2.5-coder:32b",
      "endpoint": "http://localhost:11434",
      "options": {
        "num_gpu": 1,
        "num_ctx": 32768
      }
    }
  }
}
EOF

# 4. Start building (all local!)
forge task create "Sensitive feature" --llm qwen

Next Steps

Install Ollama

Get started with local models

Other Agents

Compare with cloud agents

Privacy Workflows

Privacy-first development patterns

Ollama Library

Browse available models

Getting Started

Learn

Configuration

Reference

Troubleshooting

​Overview

​Supported Models

​OpenCode

OpenCode

​Qwen Code

Qwen Code

​Quick Start with Ollama

​Hardware Requirements

​Minimum Specs

​Recommended Setup

​Configuration

​Basic Ollama Setup

​Advanced Configuration

​GPU Acceleration

​Strengths of Local Models

​Complete Privacy

No Data Leakage

Compliance-Ready

​No Usage Limits

​No Internet Required

​Limitations

​Lower Quality

​Slower

​Hardware Intensive

​Best Use Cases

​Privacy-Sensitive Work

​Learning & Experimentation

​Air-Gapped Environments

​Cost Reduction

​Model Comparison

​OpenCode vs Qwen

​Local vs Cloud

​Performance Optimization

​Use GPU

​Adjust Context Window

​Use Smaller Models for Simple Tasks

​Troubleshooting

​Cost Analysis

​Hardware Investment

​Ongoing Costs

​Best Practices

Use for Sensitive Work

Start Small

Monitor Resources

Combine with Cloud

​Hybrid Strategy

​Strategy 1: Privacy Tiers

​Strategy 2: Cost Optimization

​Strategy 3: Network-Aware

​Real-World Example

​Setup for Privacy-First Development

​Next Steps

Install Ollama

Other Agents

Privacy Workflows

Ollama Library

Overview

Supported Models

OpenCode

Qwen Code

Quick Start with Ollama

Hardware Requirements

Minimum Specs

Recommended Setup

Configuration

Basic Ollama Setup

Advanced Configuration

GPU Acceleration

Strengths of Local Models

Complete Privacy

No Usage Limits

No Internet Required

Limitations

Lower Quality

Slower

Hardware Intensive

Best Use Cases

Privacy-Sensitive Work

Learning & Experimentation

Air-Gapped Environments

Cost Reduction

Model Comparison

OpenCode vs Qwen

Local vs Cloud

Performance Optimization

Use GPU

Adjust Context Window

Use Smaller Models for Simple Tasks

Troubleshooting

Cost Analysis

Hardware Investment

Ongoing Costs

Best Practices

Hybrid Strategy

Strategy 1: Privacy Tiers

Strategy 2: Cost Optimization

Strategy 3: Network-Aware

Real-World Example

Setup for Privacy-First Development

Next Steps