Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.
--- name: scvi-tools description: Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space. --- # scvi-tools Deep Learning Skill This skill provides guidance for deep learning-based single-cell analysis using scvi-tools, the leading framework for probabilistic models in single-cell genomics. ## How to Use This Skill 1. Identify the appropriate workflow from the model/workflow tables below 2. Read the corresponding reference file for detailed steps and code 3. Use scripts in `scripts/` to avoid rewriting common code 4. For installation or GPU issues, consult `references/environment_setup.md` 5. For debugging, consult `references/troubleshooting.md` ## When to Use This Skill - When scvi-tools, scVI, scANVI, or related models are mentioned - When deep learning-based batch correction or integration is needed - When working with multi-modal data (CITE-seq, multiome) - When reference mapping or label transfer is required - When analyzing ATAC-seq or spatial transcriptomics data - When learning latent representations of single-cell data ## Model Selection Guide | Data Type | Model | Primary Use Case | |-----------|-------|------------------| | scRNA-seq | **scVI** | Unsupervised integration, DE, imputation | | scRNA-seq + labels | **scANVI** | Label transfer, semi-supervised integration | | CITE-seq (RNA+protein) | **totalVI** | Multi-modal integration, protein denoising | | scATAC-seq | **PeakVI** | Chromatin accessibility analysis | | Multiome (RNA+ATAC) | **MultiVI** | Joint modality analysis | | Spatial + scRNA reference | **DestVI** | Cell type deconvolution | | RNA velocity | **veloVI** | Transcriptional dynamics | | Cross-technology | **sysVI** | System-level batch correction | ## Workflow Reference Files | Workflow | Reference File | Description | |----------|---------------|-------------| | Environment Setup | `references/environment_setup.md` | Installation, GPU, version info | | Data Preparation | `references/data_preparation.md` | Formatting data for any model | | scRNA Integration | `references/scrna_integration.md` | scVI/scANVI batch correction | | ATAC-seq Analysis | `references/atac_peakvi.md` | PeakVI for accessibility | | CITE-seq Analysis | `references/citeseq_totalvi.md` | totalVI for protein+RNA | | Multiome Analysis | `references/multiome_multivi.md` | MultiVI for RNA+ATAC | | Spatial Deconvolution | `references/spatial_deconvolution.md` | DestVI spatial analysis | | Label Transfer | `references/label_transfer.md` | scANVI reference mapping | | scArches Mapping | `references/scarches_mapping.md` | Query-to-reference mapping | | Batch Correction | `references/batch_correction_sysvi.md` | Advanced batch methods | | RNA Velocity | `references/rna_velocity_velovi.md` | veloVI dynamics | | Troubleshooting | `references/troubleshooting.md` | Common issues and solutions | ## CLI Scripts Modular scripts for common workflows. Chain together or modify as needed. ### Pipeline Scripts | Script | Purpose | Usage | |--------|---------|-------| | `prepare_data.py` | QC, filter, HVG selection | `python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch` | | `train_model.py` | Train any scvi-tools model | `python scripts/train_model.py prepared.h5ad results/ --model scvi` | | `cluster_embed.py` | Neighbors, UMAP, Leiden | `python scripts/cluster_embed.py adata.h5ad results/` | | `differential_expression.py` | DE analysis | `python scripts/differential_expression.py model/ adata.h5ad de.csv --groupby leiden` | | `transfer_labels.py` | Label transfer with scANVI | `python scripts/transfer_labels.py ref_model/ query.h5ad results/` | | `integrate_datasets.py` | Multi-dataset integration | `python scripts/integrate_datasets.py results/ data1.h5ad data2.h5ad` | | `validate_adata.py` | Check data compatibility | `python scripts/validate_adata.py data.h5ad --batch-key batch` | ### Example Workflow ```bash # 1. Validate input data python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest # 2. Prepare data (QC, HVG selection) python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000 # 3. Train model python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch # 4. Cluster and visualize python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8 # 5. Differential expression python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden ``` ### Python Utilities The `scripts/model_utils.py` provides importable functions for custom workflows: | Function | Purpose | |----------|---------| | `prepare_adata()` | Data preparation (QC, HVG, layer setup) | | `train_scvi()` | Train scVI or scANVI | | `evaluate_integration()` | Compute integration metrics | | `get_marker_genes()` | Extract DE markers | | `save_results()` | Save model, data, plots | | `auto_select_model()` | Suggest best model | | `quick_clustering()` | Neighbors + UMAP + Leiden | ## Critical Requirements 1. **Raw counts required**: scvi-tools models require integer count data ```python adata.layers["counts"] = adata.X.copy() # Before normalization scvi.model.SCVI.setup_anndata(adata, layer="counts") ``` 2. **HVG selection**: Use 2000-4000 highly variable genes ```python sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3") adata = adata[:, adata.var['highly_variable']].copy() ``` 3. **Batch information**: Specify batch_key for integration ```python scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch") ``` ## Quick Decision Tree ``` Need to integrate scRNA-seq data? ├── Have cell type labels? → scANVI (references/label_transfer.md) └── No labels? → scVI (references/scrna_integration.md) Have multi-modal data? ├── CITE-seq (RNA + protein)? → totalVI (references/citeseq_totalvi.md) ├── Multiome (RNA + ATAC)? → MultiVI (references/multiome_multivi.md) └── scATAC-seq only? → PeakVI (references/atac_peakvi.md) Have spatial data? └── Need cell type deconvolution? → DestVI (references/spatial_deconvolution.md) Have pre-trained reference model? └── Map query to reference? → scArches (references/scarches_mapping.md) Need RNA velocity? └── veloVI (references/rna_velocity_velovi.md) Strong cross-technology batch effects? └── sysVI (references/batch_correction_sysvi.md) ``` ## Key Resources - [scvi-tools Documentation](https://docs.scvi-tools.org/) - [scvi-tools Tutorials](https://docs.scvi-tools.org/en/stable/tutorials/index.html) - [Model Hub](https://huggingface.co/scvi-tools) - [GitHub Issues](https://github.com/scverse/scvi-tools/issues)
don't have the plugin yet? install it then click "run inline in claude" again.
expanded original skill to include explicit intent, detailed inputs with environment and data requirements, numbered procedure with edge cases and IO per step, six decision points (workflow selection, label transfer, hardware/environment, data prep), formal output contract with file locations and formats, and outcome signal criteria for success validation.
Use this skill when working with single-cell RNA-seq, ATAC-seq, CITE-seq, multiome, or spatial transcriptomics data and you need probabilistic deep learning models for batch correction, data integration, multi-modal analysis, label transfer, or RNA velocity. scvi-tools is the framework to reach for when variational autoencoders (VAEs) or related methods are mentioned, when integrating data across multiple technologies or batches, when mapping query samples to reference models, or when extracting latent representations for downstream analysis.
Python environment:
pip install scvi-tools or conda install -c conda-forge scvi-tools)Data requirements:
adata.layers["counts"])adata.obs (e.g., batch_key="batch")External resources (optional):
Model selection context:
Validate and prepare input data
Select highly variable genes (HVGs)
sc.pp.highly_variable_genes() with flavor="seurat_v3", n_top_genes=2000 (adjust 2000-4000 based on dataset size and sparsity)adata.var["highly_variable"] boolean column, filtered AnnData subset to HVGsPrepare raw count layer
adata.layers["counts"] contains original integer counts (not normalized, not log-transformed)adata.X if needed for other downstream toolsadata.layers["counts"] verified as raw counts, shape (n_obs, n_vars_hvg)adata.X is raw; if not, fail and request raw dataSelect appropriate scvi-tools model based on data type and task
Setup AnnData for the selected model
scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch"))adata.uns["scvi"] (or equivalent)Initialize and train the model
model = SCVI(adata, ...)model.train(max_epochs=N, early_stopping=True, early_stopping_patience=20)Extract and visualize latent representations
latent = model.get_latent_representation()sc.pp.neighbors(adata, use_rep="X_scvi"), sc.tl.umap(adata), sc.tl.leiden(adata, resolution=0.8)adata.obsm["X_scvi"], UMAP in adata.obsm["X_umap"], cluster labels in adata.obs["leiden"]Perform task-specific downstream analysis (differential expression, label transfer, deconvolution, etc.)
Validate integration quality and save results
model.save(dirpath)), AnnData with results (adata.write_h5ad(path)), plots (UMAP, DE volcano, etc.)Workflow selection (data type + task):
label_transfer.md)scrna_integration.md)citeseq_totalvi.md)multiome_multivi.md)atac_peakvi.md)spatial_deconvolution.md)rna_velocity_velovi.md)batch_correction_sysvi.md)Label transfer decision:
scarches_mapping.md)Hardware / environment decision:
references/environment_setup.md for CUDA/cuDNN versions, reduce batch_size, or switch to CPU trainingreferences/troubleshooting.md for version pinning and virtual environment setupData preparation edge case:
Model checkpoint:
model.pt, var_names.csv, setup_dict.jsonresults/<model_name>_checkpoint/)Processed AnnData:
adata_processed.h5adadata.layers["counts"], HVG-filtered genes, latent representation in adata.obsm["X_scvi"] (or model-specific key), UMAP/clustering in adata.obsm["X_umap"], adata.obs["leiden"]Integration metrics (if multi-batch):
integration_metrics.csvTask-specific outputs:
adata.obs["transferred_labels"], confidence scores in adata.obs["transfer_confidence"]adata.obsm["velocity"]Logs and diagnostics:
training.log (loss curves, convergence info)validation_report.txt (data QC summary, model hyperparameters, warnings)sc.read_h5ad()