Skip to main content

The Framework Bloat Problem

As AI frameworks evolve, they face an insidious challenge: framework bloat. Well-intentioned additions accumulate, creating:

Semantic Duplication

Same concept explained multiple ways with different names

Token Waste

Context windows filled with redundant instructions

Cognitive Overload

Too many overlapping guidelines confuse AI and humans

Maintenance Burden

Unclear which version is canonical, updates become painful

Example: The Delegation Problem

# File 1: spells/delegate-dont-do.md
When a task is outside your expertise, delegate to a specialist.
Never try to do everything yourself.

# File 2: spells/orchestration-boundary.md
Maintain clear boundaries between orchestration and execution.
Coordinate specialists rather than performing all work directly.

# File 3: spells/know-your-limits.md
Understand your capabilities and limitations.
Hand off work outside your core competency.

# File 4: agents/implementor.md
The implementor should delegate testing to @tests,
git operations to @git, and debugging to @debug.
Problem: All four files teach the same concept - delegation. Similarity score: 0.89 (effectively identical) Result:
  • 4x token usage for one concept
  • Inconsistent terminology (“delegate” vs “hand off” vs “coordinate”)
  • Maintenance nightmare (update one, must update all four)
  • AI confusion (which version is authoritative?)

The ACE Protocol Solution

ACE = Analyze, Count, Evidence A systematic approach to detecting and eliminating bloat while preserving essential knowledge:

Three Steps

1

Analyze - Semantic Similarity Detection

Use embeddings to find semantically similar content, even with different wording.
2

Count - Token Usage Tracking

Measure exact token consumption to identify heavyweight files and wasteful redundancy.
3

Evidence - Data-Driven Decisions

Make optimization decisions based on quantitative similarity and usage data, not gut feeling.

Step 1: Analyze with Embeddings

Semantic similarity detection using vector embeddings:

How It Works

// 1. Convert text to vector embeddings
const embedding1 = await generateEmbedding("Delegate to specialists");
const embedding2 = await generateEmbedding("Hand off to experts");

// 2. Calculate cosine similarity
const similarity = cosineSimilarity(embedding1, embedding2);
// Result: 0.87 (high similarity despite different words)

// 3. Categorize based on threshold
if (similarity > 0.85) {
  return "DUPLICATE";
} else if (similarity > 0.70) {
  return "OVERLAP";
} else {
  return "UNIQUE";
}

Using the Helper Command

# Analyze semantic similarity across files
genie helper embeddings --input genie-master/spells/universal/

# Output
Analyzing 24 spell files...

High Similarity (>0.85) - Likely Duplicates:
 delegate-dont-do.md orchestration-boundary.md (0.89)
 know-yourself.md understand-boundaries.md (0.86)
 investigate-before-commit.md safety-checks.md (0.87)

Medium Similarity (0.70-0.85) - Possible Overlap:
 learn.md continuous-improvement.md (0.78)
 context-critic.md challenge-assumptions.md (0.75)

Unique (<0.70):
 ace-protocol.md (baseline)
 blocker.md (0.42 avg similarity)
 run-in-parallel.md (0.38 avg similarity)

Recommendations:
  - Merge 3 duplicate pairs (save ~4,200 tokens)
  - Review 2 overlapping pairs (potential ~1,800 token savings)
  - Keep 19 unique spells

Understanding Similarity Scores

Score RangeInterpretationAction
0.95 - 1.0Near-identical contentImmediate merge required
0.85 - 0.94Clear duplicateMerge with confidence
0.70 - 0.84Significant overlapReview and consolidate
0.50 - 0.69Some shared conceptsKeep separate, cross-reference
0.00 - 0.49Unique contentKeep separate
The 0.85 threshold is backed by testing: at this level, files teach the same core concept even with different examples and phrasing.

Embedding Implementation

import { OpenAI } from 'openai';

const openai = new OpenAI();

async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });

  return response.data[0].embedding;
}

function cosineSimilarity(vecA: number[], vecB: number[]): number {
  const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
  const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
  const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));

  return dotProduct / (magnitudeA * magnitudeB);
}

async function findSimilarFiles(
  directory: string,
  threshold: number = 0.85
): Promise<SimilarityPair[]> {
  // 1. Load all markdown files
  const files = await loadMarkdownFiles(directory);

  // 2. Generate embeddings for each
  const embeddings = await Promise.all(
    files.map(file => ({
      file: file.path,
      embedding: await generateEmbedding(file.content)
    }))
  );

  // 3. Compare all pairs
  const pairs: SimilarityPair[] = [];
  for (let i = 0; i < embeddings.length; i++) {
    for (let j = i + 1; j < embeddings.length; j++) {
      const similarity = cosineSimilarity(
        embeddings[i].embedding,
        embeddings[j].embedding
      );

      if (similarity >= threshold) {
        pairs.push({
          fileA: embeddings[i].file,
          fileB: embeddings[j].file,
          similarity: similarity
        });
      }
    }
  }

  return pairs.sort((a, b) => b.similarity - a.similarity);
}

Step 2: Count Tokens

Precise token measurement to quantify bloat:

Token Counting Command

# Count tokens in entire framework
genie helper count-tokens --path genie-master/

# Output
Token Analysis Report
=====================

Total Tokens: 186,420

By Category:
┌─────────────┬─────────┬──────────┐
 Category Tokens Percent
├─────────────┼─────────┼──────────┤
 Agents 98,240 52.7%
 Spells 45,180 24.2%
 Workflows 28,000 15.0%
 Neurons 10,500  5.6%
 Other  4,500  2.4%
└─────────────┴─────────┴──────────┘

Top 10 Token Consumers:
1. collectives/code/implementor.md - 8,420 tokens
2. collectives/code/tests.md - 6,890 tokens
3. spells/universal/ace-protocol.md - 4,200 tokens
4. collectives/create/researcher.md - 3,850 tokens
5. spells/code/conventional-commits.md - 3,200 tokens
6. workflows/feature-development.md - 2,980 tokens
7. neurons/FORGE.md - 2,750 tokens
8. spells/universal/delegate-dont-do.md - 2,100 tokens
9. spells/universal/orchestration-boundary.md - 2,080 tokens
10. collectives/code/git.md - 1,950 tokens

Warnings:
 delegate-dont-do.md (2,100) and orchestration-boundary.md (2,080)
  have 0.89 similarity - potential 2,080 token savings

Recommendations:
- Review top 5 consumers for optimization opportunities
- Investigate similar file pairs for consolidation
- Current lazy-loading efficiency: 92% (only loading what's needed)

Per-File Token Analysis

# Analyze specific file
genie helper count-tokens --file genie-master/spells/universal/delegate-dont-do.md

# Output
File: delegate-dont-do.md
Total Tokens: 2,100

Section Breakdown:
┌──────────────────────┬─────────┬──────────┐
 Section Tokens Percent
├──────────────────────┼─────────┼──────────┤
 YAML Frontmatter     42   2.0%
 Title & Overview    180   8.6%
 Core Protocol    650  31.0%
 Examples    890  42.4%
 Integration    240  11.4%
 Related Spells     98   4.7%
└──────────────────────┴─────────┴──────────┘

Density: 2.1 tokens/word (typical for technical content)
Readability: 12.4 grade level (appropriate for developers)

Bullet Counter for Redundancy

# Count bullet point redundancy across files
genie helper bullet-counter --input genie-master/spells/universal/

# Output
Bullet Point Analysis
=====================

Duplicate Concepts Across Files:
┌─────────────────────────────────────┬───────┬──────────────────────┐
 Concept Count Files
├─────────────────────────────────────┼───────┼──────────────────────┤
 "Delegate to specialists"   8 delegate-dont-do.md,
 orchestration-*.md,
 implementor.md...
├─────────────────────────────────────┼───────┼──────────────────────┤
 "Understand your boundaries"   6 know-yourself.md,
 understand-*.md...
├─────────────────────────────────────┼───────┼──────────────────────┤
 "Ask clarifying questions"   5 learn.md,
 context-critic.md...
└─────────────────────────────────────┴───────┴──────────────────────┘

Recommendations:
- "Delegate to specialists" appears 8x - consolidate to 1 canonical source
- Potential savings: ~3,200 tokens from bullet deduplication

Token Counting Implementation

import { encode } from 'gpt-tokenizer';

function countTokens(text: string, model: string = 'gpt-4'): number {
  const tokens = encode(text);
  return tokens.length;
}

function analyzeFile(filepath: string): TokenAnalysis {
  const content = fs.readFileSync(filepath, 'utf-8');

  // Split into sections
  const sections = parseMarkdownSections(content);

  // Count tokens per section
  const sectionCounts = sections.map(section => ({
    title: section.title,
    tokens: countTokens(section.content),
    percentage: 0  // Calculate after total
  }));

  const totalTokens = sectionCounts.reduce((sum, s) => sum + s.tokens, 0);

  // Calculate percentages
  sectionCounts.forEach(section => {
    section.percentage = (section.tokens / totalTokens) * 100;
  });

  return {
    filepath,
    totalTokens,
    sections: sectionCounts,
    density: totalTokens / content.split(/\s+/).length,
    readability: calculateReadability(content)
  };
}

function findDuplicateBullets(files: string[]): DuplicateBullet[] {
  const bulletPoints = new Map<string, string[]>();

  // Extract bullets from all files
  for (const file of files) {
    const content = fs.readFileSync(file, 'utf-8');
    const bullets = extractBulletPoints(content);

    for (const bullet of bullets) {
      // Normalize for comparison
      const normalized = normalizeBullet(bullet);

      if (!bulletPoints.has(normalized)) {
        bulletPoints.set(normalized, []);
      }
      bulletPoints.get(normalized)!.push(file);
    }
  }

  // Find duplicates
  return Array.from(bulletPoints.entries())
    .filter(([_, files]) => files.length > 1)
    .map(([bullet, files]) => ({
      concept: bullet,
      count: files.length,
      files: files
    }))
    .sort((a, b) => b.count - a.count);
}

Step 3: Evidence-Based Decisions

Use similarity and token data to make optimization choices:

Decision Matrix

interface OptimizationDecision {
  fileA: string;
  fileB: string;
  similarity: number;
  tokensA: number;
  tokensB: number;
  action: 'merge' | 'consolidate' | 'cross-reference' | 'keep-separate';
  reasoning: string;
  potentialSavings: number;
}

function decideAction(
  fileA: FileAnalysis,
  fileB: FileAnalysis,
  similarity: number
): OptimizationDecision {
  // Clear duplicate - merge
  if (similarity > 0.85) {
    return {
      fileA: fileA.path,
      fileB: fileB.path,
      similarity,
      tokensA: fileA.tokens,
      tokensB: fileB.tokens,
      action: 'merge',
      reasoning: 'High similarity indicates duplicate content',
      potentialSavings: Math.min(fileA.tokens, fileB.tokens)
    };
  }

  // Significant overlap - consolidate
  if (similarity > 0.70) {
    return {
      fileA: fileA.path,
      fileB: fileB.path,
      similarity,
      tokensA: fileA.tokens,
      tokensB: fileB.tokens,
      action: 'consolidate',
      reasoning: 'Moderate overlap can be consolidated with preserved unique content',
      potentialSavings: Math.floor((fileA.tokens + fileB.tokens) * 0.3)
    };
  }

  // Some overlap - cross-reference
  if (similarity > 0.50) {
    return {
      fileA: fileA.path,
      fileB: fileB.path,
      similarity,
      tokensA: fileA.tokens,
      tokensB: fileB.tokens,
      action: 'cross-reference',
      reasoning: 'Some shared concepts - add cross-references instead of merging',
      potentialSavings: 0
    };
  }

  // Unique - keep separate
  return {
    fileA: fileA.path,
    fileB: fileB.path,
    similarity,
    tokensA: fileA.tokens,
    tokensB: fileB.tokens,
    action: 'keep-separate',
    reasoning: 'Sufficiently different content warrants separate files',
    potentialSavings: 0
  };
}

Optimization Report

# Generate full optimization report
genie helper ace-report --input genie-master/

# Output
ACE Protocol Optimization Report
================================

Analyzed: 332 files
Total Tokens: 186,420

Optimization Opportunities:
┌─────────────────────────────────────┬──────────────┬──────────┐
 Action Files Savings
├─────────────────────────────────────┼──────────────┼──────────┤
 MERGE (similarity > 0.85)           │ 6 pairs      │ 12,400   │
 CONSOLIDATE (similarity 0.70-0.85)  │ 4 pairs      │  5,200   │
 CROSS-REFERENCE (similarity 0.50-0.70) │ 8 pairs   │      0   │
 KEEP SEPARATE (similarity < 0.50)   │ 308 files    │      0   │
└─────────────────────────────────────┴──────────────┴──────────┘

Detailed Recommendations:

MERGE:
1. delegate-dont-do.md + orchestration-boundary.md
   Similarity: 0.89 | Savings: 2,080 tokens
 Keep: delegate-dont-do.md (more comprehensive)
 Merge in: unique examples from orchestration-boundary.md

2. know-yourself.md + understand-boundaries.md
   Similarity: 0.86 | Savings: 1,950 tokens
 Keep: know-yourself.md (better title)
 Merge in: boundary examples from understand-boundaries.md

3. investigate-before-commit.md + safety-checks.md
   Similarity: 0.87 | Savings: 1,840 tokens
 Keep: investigate-before-commit.md (more specific)
 Merge in: safety checklist from safety-checks.md

[... 3 more merge recommendations ...]

CONSOLIDATE:
1. learn.md + continuous-improvement.md
   Similarity: 0.78 | Savings: 1,200 tokens
 Create: learning-and-improvement.md
 Combine unique aspects from both

[... 3 more consolidation recommendations ...]

Total Potential Savings: 17,600 tokens (9.4% reduction)
Estimated Time: 4-6 hours
Risk Level: Low (high similarity = safe merge)

Automated Merge Suggestions

#!/bin/bash
# Generated by ACE Protocol

# Merge 1: delegate-dont-do + orchestration-boundary
echo "Merging delegation spells..."

# 1. Extract unique content from orchestration-boundary.md
genie helper extract-unique \
  --source orchestration-boundary.md \
  --compare delegate-dont-do.md \
  --output unique-orchestration.md

# 2. Append to delegate-dont-do.md
cat unique-orchestration.md >> delegate-dont-do.md

# 3. Add cross-reference in orchestration-boundary.md
cat > orchestration-boundary.md << EOF
---
deprecated: true
redirect: delegate-dont-do.md
---

# Orchestration Boundary

This spell has been merged into @spell/universal/delegate-dont-do
for better consolidation.

See: @spell/universal/delegate-dont-do
EOF

# 4. Update all references
genie helper update-references \
  --old "orchestration-boundary" \
  --new "delegate-dont-do" \
  --path genie-master/

echo "✓ Merge complete. Saved 2,080 tokens."

Similarity Thresholds

Research-backed thresholds for decision making:
SimilarityInterpretationActionConfidence
> 0.95Near-identicalImmediate mergeVery High
0.85 - 0.95Clear duplicateMerge with reviewHigh
0.70 - 0.85Significant overlapConsolidateMedium
0.50 - 0.70Some overlapCross-referenceLow
< 0.50UniqueKeep separateN/A

Threshold Validation

# Test thresholds on known duplicates/uniques
genie helper validate-thresholds \
  --known-duplicates duplicates.json \
  --known-uniques uniques.json

# Output
Threshold Validation Results
============================

Testing threshold: 0.85

Known Duplicates (should be > 0.85):
 47/50 correctly identified (94% accuracy)
 3/50 below threshold (false negatives)

Known Uniques (should be < 0.85):
 92/100 correctly identified (92% accuracy)
 8/100 above threshold (false positives)

Overall Accuracy: 93%
F1 Score: 0.93

Recommendation: 0.85 threshold is optimal for this corpus.

Integration with Git Hooks

Automatically enforce ACE protocol during development:

Pre-Commit Hook

#!/bin/bash
# .git/hooks/pre-commit

echo "Running ACE Protocol checks..."

# Check for new file additions
NEW_FILES=$(git diff --cached --name-only --diff-filter=A | grep '\.md$')

if [ -n "$NEW_FILES" ]; then
  echo "Checking new markdown files for duplicates..."

  # Run similarity check
  genie helper embeddings \
    --input genie-master/ \
    --new-files "$NEW_FILES" \
    --threshold 0.85 \
    --format json > /tmp/ace-check.json

  # Parse results
  DUPLICATES=$(jq '.duplicates | length' /tmp/ace-check.json)

  if [ "$DUPLICATES" -gt 0 ]; then
    echo "❌ ACE Protocol Violation: Duplicate content detected"
    echo ""
    jq -r '.duplicates[] | "  \(.fileA) ↔ \(.fileB): \(.similarity)"' /tmp/ace-check.json
    echo ""
    echo "Please consolidate duplicate content before committing."
    echo "Run: genie helper ace-report for detailed recommendations"
    exit 1
  fi

  echo "✓ No duplicates detected"
fi

# Check token budget
TOTAL_TOKENS=$(genie helper count-tokens --path genie-master/ --format json | jq '.total')
TOKEN_LIMIT=200000

if [ "$TOTAL_TOKENS" -gt "$TOKEN_LIMIT" ]; then
  echo "⚠ Warning: Token budget exceeded ($TOTAL_TOKENS / $TOKEN_LIMIT)"
  echo "Consider running ACE optimization: genie helper ace-report"
fi

echo "✓ ACE Protocol checks passed"

CI/CD Integration

# .github/workflows/ace-protocol.yml
name: ACE Protocol Check

on: [pull_request]

jobs:
  ace-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Genie
        run: npm install -g @automagik/genie

      - name: Run ACE Analysis
        run: |
          genie helper ace-report \
            --input genie-master/ \
            --format github \
            --output ace-report.md

      - name: Comment on PR
        uses: actions/github-script@v6
        with:
          script: |
            const fs = require('fs');
            const report = fs.readFileSync('ace-report.md', 'utf8');

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: report
            });

      - name: Check Token Budget
        run: |
          TOKENS=$(genie helper count-tokens --path genie-master/ --format json | jq '.total')
          echo "Total tokens: $TOKENS"

          if [ "$TOKENS" -gt 200000 ]; then
            echo "::error::Token budget exceeded: $TOKENS / 200000"
            exit 1
          fi

Helper Commands Reference

embeddings

genie helper embeddings [options]

Analyze semantic similarity across markdown files

Options:
  --input <path>          Directory to analyze
  --threshold <number>    Similarity threshold (default: 0.85)
  --format <type>         Output format: text, json, csv (default: text)
  --output <file>         Save results to file
  --new-files <files>     Compare only these files against existing

Examples:
  genie helper embeddings --input genie-master/spells/
  genie helper embeddings --input . --threshold 0.70 --format json
  genie helper embeddings --new-files "new-spell.md" --input genie-master/

count-tokens

genie helper count-tokens [options]

Count tokens in markdown files

Options:
  --path <path>           File or directory to analyze
  --model <name>          Model for tokenization (default: gpt-4)
  --format <type>         Output format: text, json, csv (default: text)
  --breakdown             Show section-by-section breakdown

Examples:
  genie helper count-tokens --path genie-master/
  genie helper count-tokens --path agent.md --breakdown
  genie helper count-tokens --path . --format json > tokens.json

bullet-counter

genie helper bullet-counter [options]

Find duplicate bullet points across files

Options:
  --input <path>          Directory to analyze
  --threshold <number>    Minimum occurrences to report (default: 2)
  --format <type>         Output format: text, json, csv (default: text)

Examples:
  genie helper bullet-counter --input genie-master/spells/
  genie helper bullet-counter --input . --threshold 3

ace-report

genie helper ace-report [options]

Generate comprehensive ACE Protocol optimization report

Options:
  --input <path>          Directory to analyze
  --format <type>         Output format: text, json, markdown, github
  --output <file>         Save report to file
  --auto-fix              Generate merge scripts for high-confidence merges

Examples:
  genie helper ace-report --input genie-master/
  genie helper ace-report --input . --format markdown --output report.md
  genie helper ace-report --input . --auto-fix > merge-script.sh

Best Practices

Regular Audits

Run ACE analysis monthly or after major additions to catch bloat early.

Pre-Commit Checks

Use git hooks to prevent duplicates from entering the codebase.

Document Decisions

Keep a log of merge decisions and reasoning for future reference.

Gradual Optimization

Don’t try to optimize everything at once. Start with high-confidence merges.

Optimization Workflow

1

Generate Report

genie helper ace-report --input genie-master/ --output ace-report.md
2

Review High-Confidence Merges

Start with similarity > 0.90 for safest wins
3

Manual Review

Read both files to confirm semantic equivalence
4

Merge Content

Combine unique examples and perspectives from both
5

Update References

Find and update all @references to merged file
6

Add Redirect

Create redirect file for deprecated content
7

Test

Verify system still works with merged content
8

Commit

Document token savings and rationale in commit message

Real-World Results

From Genie’s own ACE Protocol optimization:

Before ACE

Total Files: 358
Total Tokens: 203,450
Duplicate Pairs: 14
Token Efficiency: 68%

After ACE

Total Files: 332 (-26)
Total Tokens: 186,420 (-17,030)
Duplicate Pairs: 2 (-12)
Token Efficiency: 92%

Impact

17,030 Tokens Saved

8.4% reduction in total token usage

26 Files Merged

Easier navigation and maintenance

92% Efficiency

Only essential content loaded per task

Next Steps