Item: dflash-mlx-speculative-decoding
Rating: 5.2
Author: Implexa

dflash-mlx-speculative-decoding

Lossless DFlash speculative decoding for MLX on Apple Silicon — 1.7–4x faster LLM inference using block diffusion drafting with target model verification.

installs

stars

karma

SkillRank score ↗

5.2/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-22

dflash-mlx-speculative-decoding accelerates inference on apple silicon by using a small draft model to generate token candidates in parallel, verified by a target model in one forward pass, yielding 1.7-4x speedups with lossless output guarantees.

structure

4.0

trigger phrases

3.0

procedure

4.0

edge cases

2.0

documentation

6.0

strengths

SKILL.md

dflash-mlx Speculative Decoding

Skill by ara.so — Daily 2026 Skills collection.

DFlash implements lossless speculative decoding for MLX on Apple Silicon. A small draft model (~1B params) generates 16 tokens in parallel using block diffusion; the target model verifies all 16 in a single forward pass. Tokens are only emitted after target verification — output is lossless (every token is the target model's greedy argmax).

Typical speedups: 1.7x–4.1x over baseline mlx_lm depending on model size and context length. Acceptance rates hover around 87–90% for Qwen3.5 models.

Installation

pip install dflash-mlx

# or isolated install
pipx install dflash-mlx

Requires Python 3.10+, MLX 0.31.1+, Apple Silicon Mac.

don't have the plugin yet? install it then click "run inline in claude" again.

dflash-mlx-speculative-decoding

SKILL.md

related skills