AI content signature detector for social media posts. Measures 8 linguistic dimensions that LinkedIn's 360Brew and other platforms use to detect AI-generated...
---
name: Content Humanizer Audit
description: AI content signature detector for social media posts. Measures 8 linguistic dimensions that LinkedIn's 360Brew and other platforms use to detect AI-generated content — lexical diversity, sentence length variance, transition word density, hedging ratio, contraction usage, personal pronoun density, question frequency, and specific data density. Not a humanizer that rewrites your text — an auditor that tells you exactly which signals are triggering detection so you fix only what's wrong. Research-backed (DivEye arXiv:2509.18880, LinkedIn 360Brew algorithm analysis, stylometric detection studies). Per-platform thresholds for LinkedIn (strictest), Reddit, Twitter/X, HackerNews. Zero external dependencies.
license: Apache-2.0
homepage: https://canlah.ai
metadata:
author: Canlah AI
version: "1.0.3"
tags:
- social-media
- content
- linkedin
- ai-detection
- writing
- marketing
- authenticity
- brand-voice
---
# phy-content-humanizer-audit — AI Content Signature Detector
LinkedIn's 360Brew algorithm penalizes AI-detected content with **30% less reach and 55% less engagement**. This tool tells you exactly which linguistic signals are triggering detection — so you fix only what's wrong instead of rewriting everything.
**Not a humanizer. An auditor.**
## The Problem
You draft a LinkedIn post (maybe with AI help), publish it, and reach tanks. Why?
LinkedIn's 360Brew uses an LLM to evaluate:
- **Lexical diversity** — AI repeats vocabulary patterns
- **Sentence rhythm** — AI maintains unnaturally consistent sentence lengths
- **Transition words** — AI overuses "Furthermore", "Moreover", "Additionally"
- **Hedging language** — AI says "arguably" and "it seems" instead of stating opinions
- **Formality** — AI avoids contractions ("do not" instead of "don't")
- **Impersonality** — AI rarely uses first-person pronouns
- **No questions** — AI makes statements, doesn't ask
- **Vagueness** — AI uses abstract language with no specific data
This tool measures all 8 dimensions, scores each 0-10, and tells you your **AI Signature %** — the probability a platform algorithm will flag your content as AI-generated.
## Quick Start
```bash
# Audit a LinkedIn post draft
echo "Your post text here" | python3 ~/.claude/skills/phy-content-humanizer-audit/scripts/content_humanizer_audit.py --platform linkedin
# Audit from file
python3 ~/.claude/skills/phy-content-humanizer-audit/scripts/content_humanizer_audit.py --file draft.txt --platform reddit
# Inline text
python3 ~/.claude/skills/phy-content-humanizer-audit/scripts/content_humanizer_audit.py --text "My post..." --platform twitter
# JSON output (for pipelines)
python3 ~/.claude/skills/phy-content-humanizer-audit/scripts/content_humanizer_audit.py --file draft.txt --format json
```
## The 8 Dimensions
| # | Dimension | What It Measures | Human Signal | AI Signal |
|---|-----------|-----------------|-------------|-----------|
| 1 | **Lexical Diversity (TTR)** | Vocabulary variety (type-token ratio) | TTR 0.55-0.80 | TTR 0.35-0.55 |
| 2 | **Sentence Length Variance** | Mix of short/long sentences (coefficient of variation) | CV > 0.4 | CV < 0.3 |
| 3 | **Transition Word Density** | "Furthermore", "Moreover" per 100 words | < 1.5/100w | > 3.0/100w |
| 4 | **Hedging Ratio** | "arguably", "it seems" per 100 words | < 1.0/100w | > 2.0/100w |
| 5 | **Contraction Usage** | "don't", "I've", "it's" per 100 words | > 1.5/100w | < 0.5/100w |
| 6 | **Personal Pronoun Density** | "I", "my", "we" per 100 words | > 3.0/100w | < 1.5/100w |
| 7 | **Question Frequency** | % of sentences that are questions | 10-25% | 0-5% |
| 8 | **Specific Data Density** | Numbers, dates, names per 100 words | > 2.0/100w | < 1.0/100w |
Each dimension scores 0-10 (10 = very human). Total /80, mapped to an **AI Signature %**.
## Platform Thresholds
| Platform | WARN above | FAIL above | Why |
|----------|-----------|-----------|-----|
| **LinkedIn** | 45% | 65% | 360Brew LLM actively detects AI. Strictest. |
| **HackerNews** | 50% | 70% | Technical audience spots AI quickly. |
| **Reddit** | 55% | 75% | Community policing + mod tools. Moderate. |
| **Twitter/X** | 60% | 80% | Short form = less surface for detection. |
## AI-Flagged Word List
The tool flags 37 words that are strong AI signals on social media:
> leverage, robust, crucial, delve, tapestry, holistic, synergy, paradigm, ecosystem, landscape, streamline, cutting-edge, game-changer, innovative, revolutionary, transformative, comprehensive, meticulous, nuanced, multifaceted, pivotal, seamless, foster, utilize, facilitate, endeavor, underscore, realm, navigate, embark, spearhead, harness, unveil, bolster, cornerstone, unparalleled, groundbreaking
Each found word adds 3% to your AI signature score.
## Example Output
### Human-written post (PASS)
```
==================================================================
phy-content-humanizer-audit — AI Signature Report
==================================================================
Platform : Linkedin
Words : 183
AI Sig : 10.5% ✅ PASS
Human : 74.0/80.0
Threshold: WARN >45%, FAIL >65%
==================================================================
📊 Dimension Scores (0-10, higher = more human)
Lexical Diversity (TTR) ██████████ 10.0/10
Sentence Length Variance ██████████ 10.0/10
Transition Word Density ██████████ 10.0/10
Hedging Ratio ██████████ 10.0/10
Contraction Usage ██████████ 10.0/10
Personal Pronoun Density █████░░░░░ 5.5/10
Question Frequency ████████░░ 8.5/10
Specific Data Density ██████████ 10.0/10
```
### AI-generated post (FAIL)
```
==================================================================
Platform : Linkedin
AI Sig : 100% 🔴 FAIL
Human : 22.0/80.0
==================================================================
Transition Word Density ██░░░░░░░░ 2.6/10 (3.4/100w)
Hedging Ratio █░░░░░░░░░ 1.1/10 (3.4/100w)
Contraction Usage ░░░░░░░░░░ 0.0/10 (0.0/100w)
Question Frequency █░░░░░░░░░ 1.0/10 (0%)
Specific Data Density ░░░░░░░░░░ 0.0/10 (0.0/100w)
🚫 14 AI-flagged words: comprehensive, crucial, cutting-edge,
ecosystem, facilitate, harness, holistic, innovative, landscape,
navigate, paradigm, revolutionary, robust, transformative
```
## How to Use the Fixes
The tool outputs your **top 3 fixes** ranked by impact:
```
💡 Top 3 Fixes to Lower AI Signature:
1. Add contractions: change 'do not' → 'don't', 'I have' → 'I've'
2. Add specific data: include a number, date, or tool name
3. Remove AI words: comprehensive, crucial — replace with plain terms
```
Fix just those 3 things and re-run. Usually drops AI signature by 20-30%.
## CI / Pre-publish Gate
```bash
# Fail if AI signature > 65% (LinkedIn threshold)
echo "$POST_TEXT" | python3 content_humanizer_audit.py --platform linkedin
# Exit code: 0=PASS, 1=WARN, 2=FAIL
```
## Research Basis
| Source | Key Finding | How We Use It |
|--------|------------|---------------|
| DivEye (arXiv:2509.18880) | Human text has richer variability in lexical/structural unpredictability | TTR + sentence variance scoring |
| LinkedIn 360Brew (2026) | LLM-based feed ranking detects AI via lexical patterns, profile alignment | Platform-specific thresholds |
| Stylometric detection studies | AI shows lower sentence length variance, higher transition density | 8-dimension framework |
| LinkedIn algorithm data | 30% reach drop, 55% engagement drop for AI content | WARN/FAIL calibration |
| Consumer research | 52% reduce engagement with suspected AI content | Motivation for the tool |
## Technical Notes
- **Zero external dependencies** — pure Python 3.7+ stdlib
- **Sentence splitting** — regex-based, handles abbreviations
- **Windowed TTR** — sliding window of 100 tokens to normalize for text length
- **Exit codes** — 0 (PASS), 1 (WARN), 2 (FAIL) for CI integration
- **JSON output** — `--format json` for pipeline integration
## Companion Skills
| Skill | Relationship |
|-------|-------------|
| `phy-brand-voice-guard` | Brand-specific content rules (this tool = platform-universal AI detection) |
| `phy-post-forensics` | Analyzes why posts worked/failed (this tool = pre-publish prevention) |
| `phy-platform-rules-engine` | Platform-specific invisible rules (this tool = AI signature specifically) |
---
## Author
**[Canlah AI](https://canlah.ai)** — Run performance marketing without breaking your brand.
- GitHub: [github.com/PHY041](https://github.com/PHY041)
- All Skills: [clawhub.ai/PHY041](https://clawhub.ai/PHY041)
don't have the plugin yet? install it then click "run inline in claude" again.
added explicit inputs section with edge cases, expanded procedure into 15 granular steps with input/output per step, formalized decision points for empty input/invalid platform/short text/nan values, specified output contract with both text and json schemas including exit codes, and clarified outcome signals with concrete examples of skill working.
audit social media posts for linguistic signatures that platform algorithms (LinkedIn 360Brew, Reddit moderation, HackerNews detection, Twitter/X analysis) use to flag AI-generated content. this skill does not rewrite or humanize text. it measures 8 dimensions (lexical diversity, sentence length variance, transition word density, hedging ratio, contraction usage, personal pronoun density, question frequency, specific data density), scores each 0-10, and outputs an AI Signature percentage that tells you whether your post will trigger algorithmic suppression. use this before publishing to identify exactly which signals are firing, then fix only what matters instead of rewriting everything.
content
--text flag, or --file pathplatform selection
--platform (linkedin, reddit, twitter, hackernews)output format
--format (text or json)python environment
~/.claude/skills/phy-content-humanizer-audit/scripts/content_humanizer_audit.pyoptional: configuration file
~/.claude/skills/phy-content-humanizer-audit/config.yaml (if it exists)parse input source: read text from stdin (piped), --text inline argument, or --file path. normalize whitespace and encoding (utf-8). validate text is > 10 words, else halt with exit code 3 and error message "input text must be >= 10 words".
tokenize and clean: split text into sentences using regex that handles common abbreviations (dr., mr., etc.). split sentences into words (whitespace + punctuation). convert to lowercase for analysis. track original word count.
calculate lexical diversity (dimension 1): compute type-token ratio (TTR) using a sliding window of 100 tokens. TTR = unique words / total words in each window. average windows. map to 0-10 score: TTR 0.80+ = 10, TTR 0.35-0.55 = 0, linear interpolation between. output: individual window TTRs, final TTR score, human vs ai signal threshold.
calculate sentence length variance (dimension 2): for all sentences, record word counts. compute mean and standard deviation. calculate coefficient of variation (cv) = std / mean. map to 0-10 score: cv > 0.4 = 10 (human), cv < 0.3 = 0 (ai), linear between. output: cv value, score, narrative explanation.
scan for transition words (dimension 3): check every word against hardcoded list of 24 transition words (furthermore, moreover, additionally, consequently, subsequently, nevertheless, however, in addition, on the other hand, for instance, in particular, notably, specifically, generally, ultimately, in conclusion, as a result, meanwhile, similarly, conversely, undoubtedly, obviously, clearly, essentially). count occurrences, divide by word count * 100 to get per-100-word density. map score: < 1.5/100w = 10 (human), > 3.0/100w = 0 (ai), linear between. output: raw density, flagged words with positions, score.
scan for hedging language (dimension 4): check for 18 hedging words (arguably, it seems, it appears, somewhat, perhaps, possibly, arguably, likely, seemingly, arguably, one might say, relatively, kind of, sort of, in a sense, arguably, arguably, arguably - note: "arguably" is the strongest signal). count per 100 words. map score: < 1.0/100w = 10 (human), > 2.0/100w = 0 (ai), linear. output: raw density, flagged words with positions, score.
count contractions (dimension 5): regex search for contractions (don't, don't, can't, won't, i've, i'm, you're, it's, that's, there's, what's, who's, we've, they've, isn't, aren't, wasn't, weren't, haven't, hasn't, hadn't, etc. - minimum 24 patterns). count occurrences per 100 words. map score: > 1.5/100w = 10 (human), < 0.5/100w = 0 (ai), linear. output: raw density, all contractions found with positions, score.
count personal pronouns (dimension 6): regex search for first-person (i, me, my, mine, we, us, our, ours) and second-person (you, your, yours, yourself, yourselves) pronouns. count per 100 words. map score: > 3.0/100w = 10 (human), < 1.5/100w = 0 (ai), linear. output: raw density, breakdown by category (i/me/my vs. we/us/our vs. you), score.
calculate question frequency (dimension 7): count sentences ending with "?". divide by total sentence count. map to percentage. map score: 10-25% = 10 (human), 0-5% = 0 (ai), linear. output: raw percentage, total questions, total sentences, score.
count specific data (dimension 8): use regex to match numbers (integers, floats, percentages), dates (yyyy-mm-dd, mm/dd/yyyy, month day year), and proper nouns (capitalized word not at sentence start unless it's the first word of text and 5+ sentences exist). count per 100 words. map score: > 2.0/100w = 10 (human), < 1.0/100w = 0 (ai), linear. output: raw density, breakdown (numbers/dates/names), flagged instances with positions, score.
scan for ai-flagged word list (meta signal, not a dimension): check text against hardcoded list of 37 ai-signature words (leverage, robust, crucial, delve, tapestry, holistic, synergy, paradigm, ecosystem, landscape, streamline, cutting-edge, game-changer, innovative, revolutionary, transformative, comprehensive, meticulous, nuanced, multifaceted, pivotal, seamless, foster, utilize, facilitate, endeavor, underscore, realm, navigate, embark, spearhead, harness, unveil, bolster, cornerstone, unparalleled, groundbreaking). count unique matches. multiply by 3 to get additive bonus to ai signature. output: flagged words with positions, total count, bonus points.
aggregate scores and compute ai signature percentage: sum all 8 dimension scores (0-80 total). divide by 80 to get "human score" percentage. compute ai signature % = 100 - human score %. add ai-flagged word bonus (capped at 15 points max). clamp final ai signature to 0-100. output: all intermediate calculations, final ai signature %, human score.
apply platform thresholds: load thresholds for selected platform: linkedin (warn > 45%, fail > 65%), reddit (warn > 55%, fail > 75%), twitter (warn > 60%, fail > 80%), hackernews (warn > 50%, fail > 70%). compare final ai signature to thresholds. assign status: pass (green checkmark), warn (yellow), fail (red x). output: platform name, threshold values, status, emoji.
rank top 3 fixes by impact: sort by potential reduction impact: (1) contractions gap (if dimension 5 < 5, fix contractions), (2) specific data gap (if dimension 8 < 5, add data), (3) ai-flagged words (if count > 0, replace), (4) hedging gap (if dimension 4 < 5, reduce hedging), (5) transition word gap (if dimension 3 < 5, cut transitions). output: top 3 as numbered list with actionable suggestions.
format and output: if --format text, print human-readable table with bar charts (ascii ██░░) for each dimension, summary line with platform/words/ai-sig/status/threshold. if --format json, output structured object with all scores, flagged words, thresholds, status. exit with code 0 (pass), 1 (warn), or 2 (fail).
if input is empty or < 10 words: halt with exit code 3, error message "input text must be >= 10 words". do not proceed to analysis.
if platform argument is not in (linkedin, reddit, twitter, hackernews): halt with exit code 3, error message "unsupported platform. choose: linkedin, reddit, twitter, hackernews".
if output format is not in (text, json): default to text format, warn to stderr that format argument was invalid.
if text length is between 10 and 50 words: proceed with analysis but include warning in output that "text is very short; dimension scores may be less reliable".
if any single dimension score is nan or undefined (e.g., no sentences for question frequency calculation): set that dimension to 5.0 (neutral) and log warning to stderr.
if ai signature is < 10%: output status "pass" (green).
if ai signature is 10-44% (linkedin) or within warn-fail range for other platforms: output status "warn" (yellow).
if ai signature is >= 45% (linkedin) or >= fail threshold for platform: output status "fail" (red).
if no ai-flagged words are found: ai word bonus is 0. output message "no ai-signature words detected".
if all 8 dimensions score 9-10: output encouragement message "strong human signal across all dimensions".
if dimension is < 3: highlight that dimension in red in text output and suggest it as a top fix.
text format (default)
json format
file output (if --output flag provided)
user knows the skill worked when:
skill outputs a numeric ai signature percentage (0-100) with clear pass/warn/fail status for the chosen platform. user sees which of the 8 dimensions are dragging the score down (red bar chart items). user can identify exactly which 3 actions will improve the score most.
if text is a known human post (e.g., personal anecdote with contractions, questions, specific data), ai signature is < 15% and status is green (pass). if text is known ai-generated (e.g., chatgpt post with generic language, no contractions, no questions), ai signature is > 80% and status is red (fail).
user runs skill again after fixing top 3 items and sees ai signature drop by 15-30 percentage points. this confirms the fixes worked.
user integrates skill into ci pipeline with --format json and exit codes, sees automated pre-publish gates working (pull request fails if ai signature > 65% on linkedin, passes otherwise).
json output is valid and parseable by downstream tools (ci systems, dashboards). timestamp and version fields allow audit trail.
skill completes in < 500ms for typical 500-word post (no network calls, pure python).
credits: original skill by Canlah AI. research basis: DivEye (arXiv:2509.18880), LinkedIn 360Brew algorithm analysis, stylometric detection studies.