Ollama
Quick reference for Ollama — run large language models locally with CLI commands, REST API, Modelfiles, vision, tool calling, and OpenAI-compatible endpoints.
Installation & Setup
Install Ollama on macOS, Linux, and Windows and start the server.
Install Ollama
Install Ollama on your platform using the official installer or package manager.
Environment Configuration
Configure Ollama behavior with environment variables.
Running Models
Run and interact with models from the command line.
Run a Model
Start an interactive chat session with a model.
Model Management
Pull, list, copy, and remove models.
Pull & List Models
Download models from the registry and view installed models.
Copy & Remove Models
Create model aliases and delete models from local storage.
Modelfile
Create custom models with system prompts, parameters, and adapters using Modelfiles.
Create a Custom Model
Define a Modelfile with a base model, system prompt, and parameters, then build it.
Modelfile Parameters
Reference for common model parameters and their effects.
REST API
Interact with models programmatically using the Ollama REST API.
Generate Completions
Generate text completions using the /api/generate endpoint.
Chat Completions
Use the /api/chat endpoint for multi-turn conversations.
Embeddings
Generate vector embeddings for text using embedding models.
Generate Embeddings
Use the /api/embed endpoint to create text embeddings for RAG and similarity search.
Vision & Multimodal
Use vision models to analyze images alongside text prompts.
Image Analysis
Send images to vision-capable models via CLI or API.
Structured Output
Force models to respond with structured JSON or schema-based output.
JSON Mode & Schema
Use the format parameter to get structured JSON responses.
Tool Calling
Let models call external functions and tools to augment their capabilities.
Function Calling with Tools
Define tools that models can invoke, then process the tool calls.
OpenAI Compatibility
Use the OpenAI-compatible API to drop in Ollama as a local replacement.
OpenAI SDK Drop-in
Point the OpenAI SDK at Ollama for local inference with no code changes.
Official SDKs
Use the official Python and JavaScript libraries for a native Ollama experience.
Python SDK
Install and use the official Ollama Python library.
JavaScript SDK
Install and use the official Ollama JavaScript/TypeScript library.
GPU & Performance
Configure GPU acceleration and optimize model performance.
GPU Configuration
Configure GPU layers, parallel requests, and memory management.