back
loading skill details...
|
Run LLMs, embeddings, and image generation on Cloudflare's GPU network with 14 new 2025 models, streaming support, and 7 documented error preventions. Supports 40+ models across text generation (Llama 4, Gemma 3, Mistral 3.1, GPT-OSS), embeddings (BGE 2x faster, EmbeddingGemma), image generation (Flux, Leonardo), vision, and audio (Deepgram, Whisper v3) Handles critical 2025 breaking changes: context window validation switched from characters to tokens, BGE pooling parameter no longer backwards compatible with mean, max_tokens now correctly defaults to 256 Prevents 7 documented errors including NSFW filter false positives, missing num_steps for image generation, Miniflare AI binding resolution, and neuron consumption discrepancies Integrates with AI Gateway for per-request cache control (custom TTL, skip cache headers), logging, cost tracking, and OpenAI-compatible API endpoints Cloudflare Workers AI Status: Production Ready ✅ Last Updated: 2026-01-21 Dependencies: cloudflare-worker-base (for Worker setup) Latest Versions: wrangler@4.58.0, @cloudflare/workers-types@4.20260109.0, workers-ai-provider@3.0.2 Recent Updates (2025): April 2025 - Performance: Llama 3.3 70B 2-4x faster (speculative decoding, prefix caching), BGE embeddings 2x faster April 2025 - Breaking Changes: max_tokens now correctly defaults to 256 (was not respected), BGE pooling parameter (cls NOT backwards compatible with mean) 2025 - New Models (14): Mistral 3.1 24B (vision+tools), Gemma 3 12B (128K context), EmbeddingGemma 300M, Llama 4 Scout, GPT-OSS 120B/20B, Qwen models (QwQ 32B, Coder 32B), Leonardo image gen, Deepgram Aura 2, Whisper v3 Turbo, IBM Granite, Nova 3 2025 - Platform: Context windows API change (tokens not chars), unit-based pricing with per-model granularity, workers-ai-provider v3.0.2 (AI SDK v5), LoRA rank up to 32 (was 8), 100 adapters per account October 2025: Model deprecations (use Llama 4, GPT-OSS instead) Quick Start (5 Minutes)
don't have the plugin yet? install it then click "run inline in claude" again.