Multi-Model Council - parallel execution of multiple LLMs with voting/consensus.

SKILL.md

---
name: arena-council
description: Multi-Model Council - parallel execution of multiple LLMs with voting/consensus.
---

# ARENA-001: Multi-Model Council

Parallel execution of multiple local LLMs with voting strategies for higher quality responses.

## Why Multi-Model?

- **Diversity**: Different models = different perspectives
- **Robustness**: If one fails, others continue
- **Quality**: Consensus often beats single model
- **Cost**: All local = $0 (vs $0.60/M for cloud)

## Quick Start

```python
from scripts.council import council_decide

# Simple usage
result = council_decide(
    "Explain Python decorators",
    models=['nerdsking-3b', 'llama-3.1-8b'],
    strategy="weighted"
)
print(result)
```

## Architecture

```
User Prompt
    ↓
[Router] → Model A → Response A
         → Model B → Response B  
         → Model C → Response C
    ↓
[Voting Engine]
    ↓
Consensus Response
```

## Voting Strategies

### 1. Majority Vote
Most common response wins (exact match).

### 2. Weighted Vote (default)
Bigger models get more weight:

| Model | Weight |
|-------|--------|
| Nerdsking 3B | 1 |
| Llama 3.1 8B | 2 |
| Strand 14B | 3 |
| Mistral 24B | 4 |
| GLM 4.7 | 5 |
| Qwen3.5 35B | 6 |

## Usage Examples

### Basic
```python
from scripts.council import ModelCouncil
import asyncio

async def main():
    async with ModelCouncil() as council:
        answer = await council.decide(
            "Best practice for API design?",
            strategy="weighted"
        )
        print(answer)

asyncio.run(main())
```

### Custom Models
```python
# Use specific models only
council = ModelCouncil(
    active_models=['qwen3.5-35b', 'glm-4.7-flash'],
    timeout=30
)
```

### Raw Responses
```python
# Get all responses without voting
responses = await council.query_all("Explain async/await")
for model, response in responses.items():
    print(f"{model}: {response}")
```

## Integration with OpenClaw

```python
# In your skill:
from arena_council.scripts.council import council_decide

def handle_query(prompt):
    # Use council instead of single model
    return council_decide(
        prompt,
        strategy="weighted"
    )
```

## Performance

| Setup | Time | Cost |
|-------|------|------|
| Single (Kimi) | 2s | $0.12 |
| Council (3 local) | 5s | $0 |
| Council (5 local) | 8s | $0 |

## Requirements

```bash
pip install aiohttp
```

## Models Available

Any model running on LM Studio (`<lm-studio-url>/v1`). Recommended:
- Small/fast (QA, routing): `qwen3.5-9b`, `glm-4.7-flash`, `nerdsking-3b`
- Medium (analysis): `huihui-qwen3.5-27b-abliterated`, `huihui-mistral-small-24b-abliterated`
- Heavy (complex tasks): `qwen3.5-35b-uncensored-hauhaucs`, `holo3-35b`

Use `curl <lm-studio-url>/v1/models` to list what's loaded.

## God Mode Integration

When all council models refuse due to censorship — apply god-mode techniques and retry:

```python
# If council returns no usable responses (all refused)
if all_refused(responses):
    from god_mode import apply_techniques
    modified_prompt = apply_techniques(prompt, methods=["unicode", "prefill"])
    responses = await council.query_all(modified_prompt)
```

See: `god-mode` skill for full technique library.

## Model Susceptibility Testing

Before using a censored model in council, test it:
1. Send a borderline request directly
2. Note: refuses? → needs god-mode wrapper
3. Document in MODEL-CATALOG.md

---

If this saved you time: [☕ PayPal.me/nerudek](https://www.paypal.me/nerudek)
GitHub: [github.com/nerua1](https://github.com/nerua1)

Arena Council — Multi-Model AI Debate

SKILL.md

related skills