Item: spark-engineer
Rating: 3.9
Author: Implexa

spark-engineer

Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing…

installs

stars

karma

SkillRank score ↗

3.9/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-25

spark-engineer outlines a generic optimization workflow for distributed data processing, but lacks concrete actionable steps, specific trigger patterns, and failure recovery logic.

structure

3.0

trigger phrases

2.0

procedure

4.0

edge cases

2.0

documentation

3.0

strengths

SKILL.md

Spark Engineer

Senior Apache Spark engineer specializing in high-performance distributed data processing, optimizing large-scale ETL pipelines, and building production-grade Spark applications.

Core Workflow

Analyze requirements - Understand data volume, transformations, latency requirements, cluster resources

Design pipeline - Choose DataFrame vs RDD, plan partitioning strategy, identify broadcast opportunities

Implement - Write Spark code with optimized transformations, appropriate caching, proper error handling

Optimize - Analyze Spark UI, tune shuffle partitions, eliminate skew, optimize joins and aggregations

Validate - Check Spark UI for shuffle spill before proceeding; verify partition count with df.rdd.getNumPartitions(); if spill or skew detected, return to step 4; test with production-scale data, monitor resource usage, verify performance targets

Reference Guide

Load detailed guidance based on context:

don't have the plugin yet? install it then click "run inline in claude" again.

spark-engineer

SKILL.md

related skills