Configure Ollama as embedding provider for GrepAI. Use this skill for local, private embedding generation.
GrepAI Embeddings with Ollama
This skill covers using Ollama as the embedding provider for GrepAI, enabling 100% private, local code search.
When to Use This Skill
Setting up private, local embeddings
Choosing the right Ollama model
Optimizing Ollama performance
Troubleshooting Ollama connection issues
Why Ollama?
Advantage
Description
š Privacy
Code never leaves your machine
š° Free
No API costs or usage limits
ā” Speed
No network latency
š Offline
Works without internet
š§ Control
Choose your model
Prerequisites
Ollama installed and running
An embedding model downloaded
# Install Ollama
brew install ollama # macOS
# or
curl -fsSL https://ollama.com/install.sh | sh # Linux
# Start Ollama
ollama serve
# Download model
ollama pull nomic-embed-text
Configuration
Basic Configuration
# .grepai/config.yaml
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://localhost:11434
With Custom Endpoint
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://192.168.1.100:11434 # Remote Ollama server
With Explicit Dimensions
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://localhost:11434
dimensions: 768 # Usually auto-detected
Available Models
Recommended: nomic-embed-text
ollama pull nomic-embed-text
Property
Value
Dimensions
768
Size
~274 MB
Speed
Fast
Quality
Excellent for code
Language
English-optimized
Configuration:
embedder:
provider: ollama
model: nomic-embed-text
Multilingual: nomic-embed-text-v2-moe
ollama pull nomic-embed-text-v2-moe
Property
Value
Dimensions
768
Size
~500 MB
Speed
Medium
Quality
Excellent
Language
Multilingual
Best for codebases with non-English comments/documentation.
Configuration:
embedder:
provider: ollama
model: nomic-embed-text-v2-moe
High Quality: bge-m3
ollama pull bge-m3
Property
Value
Dimensions
1024
Size
~1.2 GB
Speed
Slower
Quality
Very high
Language
Multilingual
Best for large, complex codebases where accuracy is critical.
Configuration:
embedder:
provider: ollama
model: bge-m3
dimensions: 1024
Maximum Quality: mxbai-embed-large
ollama pull mxbai-embed-large
Property
Value
Dimensions
1024
Size
~670 MB
Speed
Medium
Quality
Highest
Language
English
Configuration:
embedder:
provider: ollama
model: mxbai-embed-large
dimensions: 1024
Model Comparison
Model
Dims
Size
Speed
Quality
Use Case
nomic-embed-text
768
274MB
ā”ā”ā”
āāā
General use
nomic-embed-text-v2-moe
768
500MB
ā”ā”
āāāā
Multilingual
bge-m3
1024
1.2GB
ā”
āāāāā
Large codebases
mxbai-embed-large
1024
670MB
ā”ā”
āāāāā
Maximum accuracy
Performance Optimization
Memory Management
Models load into RAM. Ensure sufficient memory:
Model
RAM Required
nomic-embed-text
~500 MB
nomic-embed-text-v2-moe
~800 MB
bge-m3
~1.5 GB
mxbai-embed-large
~1 GB
GPU Acceleration
Ollama automatically uses:
macOS: Metal (Apple Silicon)
Linux/Windows: CUDA (NVIDIA GPUs)
Check GPU usage:
ollama ps
Keeping Model Loaded
By default, Ollama unloads models after 5 minutes of inactivity. Keep loaded:
# Keep model loaded indefinitely
curl http://localhost:11434/api/generate -d '{
"model": "nomic-embed-text",
"keep_alive": -1
}'
Verifying Connection
Check Ollama is Running
curl http://localhost:11434/api/tags
List Available Models
ollama list
Test Embedding
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "function authenticate(user, password)"
}'
Running Ollama as a Service
macOS (launchd)
Ollama app runs automatically on login.
Linux (systemd)
# Enable service
sudo systemctl enable ollama
# Start service
sudo systemctl start ollama
# Check status
sudo systemctl status ollama
Manual Background
nohup ollama serve > /dev/null 2>&1 &
Remote Ollama Server
Run Ollama on a powerful server and connect remotely:
On the Server
# Allow remote connections
OLLAMA_HOST=0.0.0.0 ollama serve
On the Client
# .grepai/config.yaml
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://server-ip:11434
Common Issues
ā Problem: Connection refused
ā
Solution:
# Start Ollama
ollama serve
ā Problem: Model not found
ā
Solution:
# Pull the model
ollama pull nomic-embed-text
ā Problem: Slow embedding generation
ā
Solutions:
Use a smaller model (nomic-embed-text)
Ensure GPU is being used (ollama ps)
Close memory-intensive applications
Consider a remote server with better hardware
ā Problem: Out of memory
ā
Solutions:
Use a smaller model
Close other applications
Upgrade RAM
Use remote Ollama server
ā Problem: Embeddings differ after model update
ā
Solution: Re-index after model updates:
rm .grepai/index.gob
grepai watch
Best Practices
Start with nomic-embed-text: Best balance of speed/quality
Keep Ollama running: Background service recommended
Match dimensions: Don't mix models with different dimensions
Re-index on model change: Delete index and re-run watch
Monitor memory: Embedding models use significant RAM
Output Format
Successful Ollama configuration:
ā
Ollama Embedding Provider Configured
Provider: Ollama
Model: nomic-embed-text
Endpoint: http://localhost:11434
Dimensions: 768 (auto-detected)
Status: Connected
Model Info:
- Size: 274 MB
- Loaded: Yes
- GPU: Apple Metaldon't have the plugin yet? install it then click "run inline in claude" again.