clawhub

mcore-run-on-slurm

How to launch distributed Megatron-LM training jobs on a SLURM cluster. Covers a minimal sbatch skeleton, environment-variable setup for torch.distributed.run, CUDA_DEVICE_MAX_CONNECTIONS rules across hardware and parallelism modes, container conventions, monitoring, and per-rank failure diagnosis.

view source

installs

stars

karma

full SKILL.md lives at the source

we've indexed the metadata for this skill but the body is fetched on demand. click "view source" above to read the canonical SKILL.md on clawhub, or "run inline in claude" to apply it without leaving your session.

read on clawhub

related skills

semantically similar in the cross-vendor index

smithery

60% match

training-llms-megatron

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100

don't have the plugin yet? install it then click "run inline in claude" again.