Use when working with "PDF", "Excel", "Word", "PowerPoint", "XLSX", "DOCX", "PPTX", "spreadsheets", "presentations", "extract text", "merge documents",…
Process, extract, and manipulate PDF, Excel, Word, and PowerPoint documents programmatically. Supports four major office formats (PDF, XLSX, DOCX, PPTX) with format-specific tools: pypdf and pdfplumber for PDFs, openpyxl and pandas for Excel, python-docx for Word, python-pptx for PowerPoint Core operations include text and table extraction, document merging and splitting, format conversion, and OCR for scanned PDFs Excel-specific guidance emphasizes writing formulas rather than static values for dynamic calculations, plus financial modeling conventions (color-coded text and fills) Word documents support tracked changes via XML editing for professional redlining; PowerPoint covers slide structure, speaker notes, and design principles for consistent layouts Document Processing Guide Work with office documents: PDF, Excel, Word, and PowerPoint. Format Overview Format Extension Structure Best For PDF .pdf Binary/text Reports, forms, archives Excel .xlsx XML in ZIP Data, calculations, models Word .docx XML in ZIP Text documents, contracts PowerPoint .pptx XML in ZIP Presentations, slides Key concept: XLSX, DOCX, and PPTX are all ZIP archives containing XML files. You can unzip them to access raw content.
don't have the plugin yet? install it then click "run inline in claude" again.