Use when system needs to handle AI uncertainty, Agent types must be selected, APIs will be consumed by AI, or architecture must support probabilistic outputs...
SKILL.md

---
name: ai-friendly-architecture-design
description: Use when system needs to handle AI uncertainty, Agent types must be selected, APIs will be consumed by AI, or architecture must support probabilistic outputs and dynamic planning
license: Apache-2.0
---

# AI Friendly Architecture Design

## Overview

AI Friendly architecture enables traditional systems to handle AI's inherent **uncertainty** through three paradigm shifts: deterministic→probabilistic, structured→semantic, and static→dynamic. This skill guides agents to apply these principles correctly and avoid common anti-patterns.

**Core principle:** Use appropriate architecture for the problem—don't over-engineer with AI when traditional solutions suffice.

## When to Use

**Use when:**
- Designing systems that incorporate LLM/AI capabilities
- Evaluating whether to use AI Friendly architecture vs traditional architecture
- Designing Agent-based systems (ReAct, Plan, Multi-Agent)
- Creating APIs that will be consumed by AI Agents
- Building context engineering pipelines for AI applications

**Do NOT use when:**
- Building simple CRUD applications with no AI requirements
- Creating AI Workflow applications that only call pre-built Agents as APIs
- The system only needs deterministic, rule-based logic

## The Three Paradigm Shifts

### 1. Deterministic → Probabilistic

**Traditional:** Output follows `y=f(x)` mapping—binary success/failure.

**AI Friendly:** Output emerges from model + prompt + context + environment. Design goal: converge probabilistic output to an acceptable "safe interval" through RAG, prompt engineering, and evaluation mechanisms.

**Design implication:** Don't expect exact schema compliance from AI outputs. Build validation and fallback mechanisms.

### 2. Structured → Semantic

**Traditional:** Input must match predefined Schema exactly (JSON field types). System boundary is a rigid wall.

**AI Friendly:** System understands natural language and unstructured data. Responds based on **intent**, not format. System boundary becomes an elastic membrane.

**Design implication:** Design interfaces that accept flexible inputs and translate intent to actions.

### 3. Static → Dynamic

**Traditional:** Execution paths defined by hardcoded if-else logic or rules. Behavior is enumerable and verifiable.

**AI Friendly:** System makes decisions based on models, can reason about current state, decompose tasks, and respond to unknown changes without human intervention.

**Design implication:** Shift from "rules" to "planning"—grant systems autonomy for intelligent task orchestration.

## Architecture Layers

```
┌─────────────────────────────────────────────────────────────┐
│                    Quality & Stability Layer                 │
│         (AI Observability, Evaluation, Security)            │
├─────────────────────────────────────────────────────────────┤
│                      Application Layer                       │
│    ┌──────────┐  ┌──────────┐  ┌──────────────────────┐    │
│    │  Agent   │  │  Intent  │  │      Session         │    │
│    │  Layer   │  │  Layer   │  │      Layer           │    │
│    └──────────┘  └──────────┘  └──────────────────────┘    │
├─────────────────────────────────────────────────────────────┤
│                    Capability Layer                          │
│         (MCP, RAG, Function Calling)                        │
├─────────────────────────────────────────────────────────────┤
│                   Foundation Layer                           │
│    ┌──────────┐  ┌──────────┐  ┌──────────────────────┐    │
│    │  Model   │  │ Knowledge│  │   Tool Management    │    │
│    │Management│  │Management│  │                      │    │
│    └──────────┘  └──────────┘  └──────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
```

### Foundation Layer

- **Model Management:** Unified API (OpenAI protocol) for multiple LLM providers
- **Knowledge Management:** Vector storage and retrieval for different knowledge sources
- **Tool Management:** MCP protocol for tool integration, Computer Use skills

#### Model Management Details

**Provider Selection:**

| Factor | Consideration |
|--------|--------------|
| Latency requirements | Regional providers vs global (OpenAI, Anthropic) |
| Cost | Per-token pricing, batch discounts, small model for simple tasks |
| Data privacy | On-premise (Ollama, vLLM) vs cloud API |
| Capability | Task-specific: code (Codex), vision (GPT-4V), reasoning (Claude) |

**Failover Strategy:**
1. Primary model → fallback model → rules-based fallback
2. Circuit breaker pattern: after N failures, skip model for cooldown period
3. Graceful degradation: reduce output quality rather than fail completely

**Cost Optimization:**
- Model tiering: small model for classification, large model for reasoning
- Caching: cache deterministic or near-duplicate queries
- Batching: group non-urgent requests for batch API pricing
- Prompt optimization: shorter prompts = fewer tokens = lower cost

### Agent Layer

> **Reference:** The ReAct pattern is from the paper [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629) (Yao et al., 2022).

Three Agent types for different scenarios (this is one common taxonomy—other frameworks may use different classifications):

| Agent Type | Capability | Use Case |
|------------|------------|----------|
| **BaseAgent** | Fixed workflow, no dynamic planning | Simple chatbots, AI Workflows |
| **ReActAgent** | Thought→Action→Observation loop | Rational tasks with tool use |
| **PlanAgent** | Global planning + ReAct execution | Complex tasks requiring strategy |

> **Note:** "PlanAgent" is a common architectural pattern but not a standardized term—different frameworks may name it differently.

**ReAct + Plan Combination:**
- Plan produces global strategy (use templates for quality)
- ReAct executes domain-specific reasoning
- Together they handle both strategic and tactical problems

### Intent Layer

Required only for multi-intent scenarios. Handles:
- **Parallel intents:** Multiple independent intents in one query
- **Sequential intents:** Intent B depends on Intent A result
- **Logical intents:** Intents with logical relationships

Also performs query rewriting and expansion for intent optimization.

### Session Layer

Manages conversation state and user context. Feeds data into Context Engineering.

**Responsibilities:**
- Session lifecycle: creation, timeout, cleanup
- State persistence: save/load conversation state across sessions
- User context binding: associate session with user profile, preferences

**What it does NOT do:**
- Memory management strategies → see Context Engineering
- RAG retrieval → see Capability Layer
- Model selection → see Foundation Layer

Session Layer provides the **data**; Context Engineering decides **how to use it**.

## Multi-Agent Patterns

Three coordination patterns:

| Pattern | Decision Point | Use Case |
|---------|----------------|----------|
| **Centralized** | Single coordinator agent | Clear task decomposition |
| **Decentralized** | Peer-to-peer negotiation | Collaborative problem-solving |
| **Hybrid** | Mixed coordination | Complex domains with sub-domains |

**MOE (Mixture of Experts) Pattern:**
- Each domain has specialized Agent (product, order, inventory, etc.)
- Central Agent performs intent recognition and task distribution
- Domain Agents execute with ReAct + Plan capabilities

## AI Friendly API Design

### Tool Atomicity
Split interfaces into atomic tools matching Agent reasoning patterns:
```
❌ getProductWithInventoryAndPricing(id)
✅ getProduct(id)
✅ getInventory(productId)
✅ getPricing(productId)
```

### Parameter Design
Human-readable names, flat KV structure, core parameters only:
```json
// ❌ Bad: Nested, complex
{"product": {"identifiers": {"sku_id": "123"}, "filters": {"status": "active"}}}

// ✅ Good: Flat, clear
{"sku_id": "123", "status": "active"}
```

### Error Handling
- **Expected errors:** Short descriptions for Agent reasoning
- **Unexpected errors:** Stack traces for error diagnosis

## Context Engineering

### Beyond Prompt Engineering

Context Engineering selects, organizes, and compresses information optimally within context windows. It is more impactful than model selection for production systems.

### Core Techniques

#### 1. Historical Case Library (Illustrative: ~8% accuracy improvement)

*Based on production AI Review system results. See [Real-World Impact](#real-world-impact) section.

Store past successful cases with their decision reasoning. Retrieve similar cases via vector search to guide current decisions.

- **When to use:** Tasks with recurring patterns (code review, customer support, troubleshooting)
- **Implementation:** Embed past cases → vector store → similarity search → inject top-K into context
- **Key:** Include both the case AND the reasoning, not just the result

#### 2. Hybrid Decision-Making (Illustrative: ~10%+ accuracy improvement)

*Based on production CogentAI system results. See [Real-World Impact](#real-world-impact) section.

Multiple models vote with confidence scores. Use when single-model accuracy is insufficient.

- **When to use:** High-stakes decisions, compliance checks, medical/financial analysis
- **Implementation:** Run 2-3 models in parallel → collect outputs → weighted voting based on domain confidence
- **Key:** Assign domain-specific confidence weights, not uniform voting

#### 3. Memory Management

| Type | Scope | Implementation | Use Case |
|------|-------|---------------|----------|
| Long-term memory | Cross-session | Persistent store (DB/vector) | User preferences, history |
| Short-term memory | Current session | In-context window | Current task context |
| Working memory | Current step | Scratchpad pattern | Intermediate reasoning |

- **Summarization:** Periodically compress long-term memory to avoid context overflow
- **Relevance scoring:** Only inject relevant memories, not everything

#### 4. Advanced RAG

| Technique | When to Use | Trade-off |
|-----------|------------|-----------|
| Standard RAG | Simple Q&A over documents | Low complexity, moderate accuracy |
| GraphRAG | Entity relationships, knowledge graphs | Higher complexity, better associative retrieval |
| Dynamic context pruning | Long conversations, large knowledge bases | Reduces noise, may lose relevant context |
| Hybrid retrieval (dense + sparse) | Mixed structured/unstructured data | Best recall, more infrastructure |

### Decision Guide

```
Is the task pattern-recurring?
├─ Yes → Build historical case library
└─ No → Is single-model accuracy sufficient?
    ├─ Yes → Standard RAG + memory management
    └─ No → Hybrid decision-making
        └─ Complex entity relationships?
            ├─ Yes → GraphRAG
            └─ No → Standard RAG + dynamic pruning
```

## Quick Reference: Decision Framework

### Decision Questions

| # | Question | If Yes | If No |
|---|----------|--------|-------|
| 1 | Is the task deterministic? | Traditional architecture (MVC/DDD) | → Q2 |
| 2 | Does it need language understanding? | → Q3 | Rules or traditional ML |
| 3 | Are there hard performance constraints (<100ms)? | Hybrid: AI offline + rules online, caching | → Q4 |
| 4 | Are there strict cost constraints? | Hybrid: AI + rules, small models, caching | → Q5 |
| 5 | Does it need dynamic tool selection? | → Q6 | Single LLM call or AI Workflow |
| 6 | Does it need multi-step reasoning? | PlanAgent + ReActAgent | ReActAgent |
| 7 | Multiple domains? | Multi-Agent (MOE pattern) | Single Agent |

### Decision Criteria

**Q1: Is the task deterministic?**
- Yes: Input→output mapping is fixed, no natural language understanding needed
- No: Task requires understanding intent, context, or unstructured data
- Examples:
  - Deterministic: Form validation, data transformation, CRUD operations
  - Non-deterministic: Customer support, content generation, recommendation

**Q2: Does it need language understanding?**
- Yes: Task involves natural language input/output, requires semantic understanding
- No: Task can be solved with rules, traditional ML, or simple pattern matching
- Examples:
  - Needs understanding: Chatbots, content analysis, query interpretation
  - Rules/ML: Fraud detection (features), image classification (CNN)

**Q3: Are there hard performance constraints?**
- Yes: Response time <100ms, real-time requirements, SLA commitments
- No: Batch processing, async operations, acceptable delays
- Hybrid approach: AI for offline training/analysis, rules for online decisions

**Q4: Are there strict cost constraints?**
- Yes: Limited budget, high volume, cost-sensitive application
- No: Budget allows for AI infrastructure, low volume
- Hybrid approach: AI for high-value decisions, rules for routine tasks

**Q5: Does it need dynamic tool selection?**
- Yes: System must choose tools/APIs based on context at runtime
- No: Fixed workflow, predetermined tool sequence
- Examples:
  - Dynamic: Research agent, adaptive troubleshooting
  - Fixed: Data pipeline, report generation

**Q6: Does it need multi-step reasoning?**
- Yes: Task requires planning, decomposition, backtracking
- No: Single-step execution, direct action
- Examples:
  - Multi-step: Complex research, multi-domain problem solving
  - Single-step: Simple Q&A, data lookup

**Q7: Multiple domains?**
- Yes: System handles multiple specialized areas with different knowledge/tools
- No: Single domain, focused expertise
- Examples:
  - Multi-domain: Enterprise support (product, billing, technical)
  - Single-domain: Specialized medical diagnosis

**Constraint Strategy Table:**

| Constraint | Strategy | Example | Priority |
|-----------|----------|---------|----------|
| Latency <100ms | AI offline training + rules online, caching | Real-time recommendation | Hard |
| High volume (10M+/day) | Caching + small models + rules fallback | API gateway | Soft |
| Limited budget | Hybrid AI+rules, model tiering | Internal tools | Soft |
| Data privacy | On-premise models, data anonymization | Medical/financial | Hard |
| Performance + Cost | Small models + caching, reduce AI scope to high-ROI tasks | Budget real-time system | Hard > Soft |
| Performance + Privacy | On-premise small models + caching, accept lower accuracy | Hospital system | Hard > Hard |
| Cost + Privacy | On-premise small models, rules for low-value decisions | Internal medical tool | Soft < Hard |

**Rule:** Hard constraints (latency, privacy) beat soft constraints (cost, accuracy). When constraints conflict, optimize for the hard constraint first.

## Common Mistakes

| Mistake | Why Wrong | Fix |
|---------|-----------|-----|
| Using Multi-Agent for simple FAQ | Over-engineering, adds latency/cost | Use single Agent with knowledge base |
| Complex nested API for Agent tools | Agents can't parse deep structures | Atomic tools with flat parameters |
| Skipping Intent Layer for multi-intent queries | Agent can't distinguish user goals | Add Intent Layer with query rewriting |
| Using ReAct for deterministic tasks | Unnecessary reasoning overhead | Use BaseAgent or workflow |
| Ignoring Context Engineering | Poor model performance | Build case libraries, hybrid decisions |
| Ignoring performance constraints | AI inference latency breaks SLA | Use hybrid architecture, see Constraint Strategy Table |
| Ignoring cost constraints | Unsustainable AI spend at scale | Model tiering, caching, rules fallback |
| "To use AI" as architecture goal | No business value | Define specific problems AI solves |

## Rationalization Table

| Excuse | Reality |
|--------|---------|
| "Multi-Agent is always better" | Coordination overhead isn't justified for simple tasks |
| "ReAct can handle everything" | Deterministic tasks don't need dynamic reasoning |
| "AI Friendly API is too much work" | Atomic tools are easier to maintain and test |
| "Context Engineering is optional" | Memory and context are more important than model choice |
| "We need AI for everything" | Traditional architecture handles deterministic logic better |
| "Paradigm shifts are just theory" | They explain WHY the patterns work—skip them at your peril |
| "Context Engineering is just RAG" | Includes memory, hybrid decisions, case libraries beyond RAG |
| "Intent Layer is optional" | Required for multi-intent scenarios—Agent alone can't distinguish |
| "This only applies to LLM systems" | Principles apply to any AI system with uncertainty |

## Red Flags - STOP and Reconsider

- Adding AI without clear problem definition
- Using Agents where simple LLM calls suffice
- Complex nested APIs for Agent consumption
- Skipping evaluation and observability
- Ignoring the "when NOT to use" guidelines

## Real-World Impact

From production systems:
- **AI Review:** Illustrative results: 95.7% accuracy, 99.1% recall, 80%+ efficiency improvement
- **AI Q&A (CogentAI):** Illustrative results: 98%+ problem-solving accuracy, 80%+ efficiency improvement
- **Key success factors:** Proper architecture selection, Context Engineering, continuous evaluation

> Note: These metrics are illustrative examples from the source article, not independently verified measurements. Results will vary based on implementation quality, data, and domain.

## References

- [Article Summary](references/article-summary.md) (English)
- [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629) (Yao et al., 2022)
- [Skill Authoring Guide](https://github.com/GanJiaKouN16/Function-Point-Skill/blob/main/skill-authoring-guide.md)
Ai Friendly Architecture Design Skill

SKILL.md

related skills