Agents Why Implexa Pricing Resources Install Sign in Sign up

back

loading skill details...

implexa · Agents that run your business, free on the Claude or Codex plan you already pay for

Agents by category All agents Compare Why Implexa Pricing Resources Skill rankings Install Developers GitHub llms.txt X / Twitter Privacy

serving-llms-vllm: Claude skill | implexa

Agents Why Implexa Pricing Resources Install Sign in Sign up

back to search

skills.shby @orchestra-research

serving-llms-vllm

Name: serving-llms-vllm
Availability: InStock
Author: orchestra-research

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference…

Do more with this skill

Create multi-step agents using this skill at Implexa

Implexa builds skills like this into agents that run on a schedule inside your own Claude or Codex, as you, free.

Build an agent with this skill

view source

installs

stars

karma

SKILL.md

vLLM - High-Performance LLM Serving

Quick start

vLLM achieves 24x higher throughput than standard transformers through PagedAttention (block-based KV cache) and continuous batching (mixing prefill/decode requests).

Installation:

pip install vllm

Basic offline inference:

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Llama-3-8B-Instruct")
sampling = SamplingParams(temperature=0.7, max_tokens=256)