Model Comparison
Schatzi AI provides access to 27 active state-of-the-art AI models. All models run exclusively on Swiss infrastructure, ensuring your data never leaves Switzerland and remains fully compliant with Swiss data protection regulations.
This guide helps you choose the right model for your specific needs, understand pricing, and optimize costs while maintaining complete data sovereignty.
Complete Model Reference
Swiss LLM (AI Act Compliant)
Apertus Swiss LLM - Large
- Context Window: 65,536 tokens
- Streaming: Supported
Capabilities:
- Data and methods documented for unprecedented transparency
- Compliant with the AI Act and respectful of privacy and intellectual property
- 70B version delivering performance on a par with current market leaders
- Ideal for multilingual services, government agencies, and R&D teams
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Apertus Swiss LLM - Large (API only)
- Context Window: 65,536 tokens
- Streaming: Supported
Capabilities:
- Data and methods documented for unprecedented transparency
- Compliant with the AI Act and respectful of privacy and intellectual property
- 70B version delivering performance on a par with current market leaders
- Ideal for multilingual services, government agencies, and R&D teams
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Apertus Swiss LLM - Small (API only)
- Context Window: 65,536 tokens
- Streaming: Supported
Capabilities:
- Optimized for multilingual dialogue use cases
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Vision & Document Analysis
Document Analysis - Medium (API only)
- Context Window: 32,768 tokens
- Vision: Supported
- Function Calling: Supported
- Streaming: Supported
Capabilities:
- Optimized as a compact and efficient vision-language model
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Document Analysis - Small (API only)
- Context Window: 32,768 tokens
- Vision: Supported
- Function Calling: Supported
- Streaming: Supported
Capabilities:
- Optimized for multilingual dialogue use cases
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Document Analysis - Small (API only)
- Context Window: 32,000 tokens
- Vision: Supported
- Streaming: Supported
Capabilities:
- Optimized for handling text and image input and generating text output
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Document Analysis - Xtra Small (API only)
- Context Window: 16,384 tokens
- Vision: Supported
- Streaming: Supported
Capabilities:
- Optimized as a compact and efficient vision-language model
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Llama 4 Maverick multi modal - Small (API only)
- Context Window: 32,768 tokens
- Vision: Supported
- Function Calling: Supported
- Streaming: Supported
Capabilities:
- Optimized for text and multimodal experiences
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Document Analysis & OCR - Small (DeepSeek OCR) (API only)
- Context Window: 8,192 tokens
- Vision: Supported
- Streaming: Supported
Capabilities:
- Specialized for optical character recognition and document understanding
- Excels at converting documents to structured text/markdown
- High proficiency in table extraction and mathematical content recognition
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
inference-miner-u25 (API only)
- Context Window: Variable
- Vision: Not Supported
- Streaming: Supported
Capabilities:
- Vision-language model optimized for document analysis and parsing
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Reasoning & Problem-Solving
Fast Reasoning & Instruction Following - Small (API only)
- Context Window: 32,768 tokens
- Function Calling: Supported
- Streaming: Supported
Capabilities:
- Optimized for reasoning and instruction-following capabilities
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Reasoning & Problem Solving - Small (API only)
- Context Window: 32,768 tokens
- Function Calling: Supported
- Reasoning: Supported
- Streaming: Supported
Capabilities:
- Optimized for thinking and reasoning
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Reasoning & Agent tasks - Large (API only)
- Context Window: 65,536 tokens
- Function Calling: Supported
- Reasoning: Supported
- Streaming: Supported
Capabilities:
- Optimized for powerful reasoning and agentic tasks
- Versatile developer use cases
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Reasoning & Problem Solving - Medium (API only)
- Context Window: 32,768 tokens
- Function Calling: Supported
- Reasoning: Supported
- Streaming: Supported
Capabilities:
- Optimized for thinking and reasoning
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Reasoning & Problem Solving - Small (API only)
- Context Window: 32,768 tokens
- Function Calling: Supported
- Reasoning: Supported
- Streaming: Supported
Capabilities:
- Optimized for reasoning chat completions
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Reasoning & Problem Solving - Xtra Large (API only)
- Context Window: 65,536 tokens
- Function Calling: Supported
- Reasoning: Supported
- Streaming: Supported
Capabilities:
- Optimized for reasoning chat completions
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Reasoning & Problem Solving - Xtra Large (API only)
- Context Window: Variable
- Streaming: Supported
Capabilities:
- Optimized for reasoning chat completions
- Dedicated reasoning model
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Reasoning & Tool Use - Large (GLM-4.5 Air) (API only)
- Context Window: 131,072 tokens
- Function Calling: Supported
- Reasoning: Supported
- Streaming: Supported
Capabilities:
- Mixture-of-Experts architecture
- Hybrid reasoning with configurable thinking mode
- Strong tool/function calling and code generation capabilities
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Chat & Document Analysis & Reasoning - Large (API only)
- Context Window: Variable
- Vision: Supported
- Function Calling: Supported
- Reasoning: Supported
- Streaming: Supported
Capabilities:
- Large-scale model delivering frontier-level performance across complex tasks
- Advanced multilingual capabilities
- Reasoning mode for dynamic response tailoring based on query complexity
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Chat, Document Analysis, Coding & Reasoning - Xtra Large
- Context Window: 1,000,000 tokens
- Vision: Supported
- Function Calling: Supported
- Reasoning: Supported
- Streaming: Supported
Capabilities:
- Multi-modal model optimized for chat, document analysis, coding, and reasoning
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Chat, Vision, Document Analysis & Reasoning - Medium
- Context Window: 256,000 tokens
- Vision: Supported
- Function Calling: Supported
- Reasoning: Supported
- Streaming: Supported
Capabilities:
- Best-in-class multi-modal model
- Optimized for chat, vision, document analysis, coding, and reasoning
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Chat, Document Analysis & Agent tasks - Xtra Large
- Context Window: 250,000 tokens
- Vision: Supported
- Function Calling: Supported
- Reasoning: Supported
- Streaming: Supported
Capabilities:
- Very large-scale model delivering frontier-level performance across complex tasks
- Advanced multilingual capabilities
- Reasoning mode for dynamic response tailoring
- Optimized for powerful reasoning, agentic tasks, and versatile developer use cases
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Multilingual
Llama 3.3 Multi-lingual - Medium (API only)
- Context Window: 131,072 tokens
- Streaming: Supported
Capabilities:
- Optimized for multilingual dialogue use cases
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Chat & Function Calling - Small (Granite 3.1) (API only)
- Context Window: 131,072 tokens
- Function Calling: Supported
- Streaming: Supported
Capabilities:
- Long-context model optimized for instruction following, RAG, summarization, and text extraction
- Supports 12 languages including English, German, French, Italian, and Dutch
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Chat & Document Analysis - Xtra Xtra Large (API only)
- Context Window: Variable
- Streaming: Supported
Capabilities:
- Optimized for multilingual dialogue use cases
- Note: This model will be deprecated soon
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Chat, Multi-lingual, Coding & function calling - Small
- Context Window: 128,000 tokens
- Function Calling: Supported
- Streaming: Supported
Capabilities:
- Versatile small model optimized for chat, coding, and multilingual tasks
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Chat & General Purpose
Search, Chat & Analysis - Small (API only)
- Context Window: Variable
- Vision: Supported
- Streaming: Supported
Capabilities:
- Optimized for web search and chat
- Suitable for artists and content creation, including storytelling
Pricing:
- Input: ... per million tokens
- Output: ... per million tokens
Complete Pricing Overview
Loading prices...
Pricing is subject to change at our discretion.
Cost Estimation
Typical Task Types
| Task Type | Token Usage | Recommended Model |
|---|---|---|
| Email response | 500 input + 300 output | Chat, Multi-lingual, Coding & function calling - Small |
| 10-page document summary | 10K input + 1K output | Chat, Vision, Document Analysis & Reasoning - Medium |
| Contract analysis (30 pages) | 30K input + 2K output | Apertus Swiss LLM - Large |
| Complex reasoning task | 5K input + 3K output | Reasoning & Agent tasks - Large |
| Multilingual exchange | 15K input + 10K output | Chat, Document Analysis & Agent tasks - Xtra Large |
Model Availability
Schatzi AI offers models across two channels:
- Chat UI (via chat.schatziai.ch / OpenWebUI): Access to models with
allavailability. - API (via REST API with API keys): Access to all models, including those marked as
apionly.
Chat UI Models
The following models are available directly in the Chat interface:
- Apertus Swiss LLM - Large
- Chat, Multi-lingual, Coding & function calling - Small
- Chat, Document Analysis, Coding & Reasoning - Xtra Large
- Chat, Vision, Document Analysis & Reasoning - Medium
- Chat, Document Analysis & Agent tasks - Xtra Large
API-Only Models
These models are only accessible via API key and are not visible in the Chat UI:
- Apertus Swiss LLM - Large (API only)
- Apertus Swiss LLM - Small
- Document Analysis - Medium
- Document Analysis - Small (Multiple versions)
- Document Analysis - Xtra Small
- Llama 4 Maverick multi modal - Small
- Document Analysis & OCR - Small (DeepSeek OCR)
- inference-miner-u25
- Fast Reasoning & Instruction Following - Small
- Reasoning & Problem Solving - Small (Multiple versions)
- Reasoning & Agent tasks - Large
- Reasoning & Problem Solving - Medium
- Reasoning & Problem Solving - Xtra Large (Multiple versions)
- Reasoning & Tool Use - Large (GLM-4.5 Air)
- Chat & Document Analysis & Reasoning - Large
- Llama 3.3 Multi-lingual - Medium
- Chat & Function Calling - Small (Granite 3.1)
- Chat & Document Analysis - Xtra Xtra Large
- Search, Chat & Analysis - Small
For the full technical reference, please visit Available API Models.
Related Documentation
FAQ
Q: Can I switch models mid-conversation? A: Yes. You can change models at any time. The conversation context carries over, though very long contexts may be truncated if the new model has a smaller context window.
Q: Which model is best for Swiss legal documents? A: Apertus Swiss LLM - Large is specifically designed for this use case. It is AI Act compliant and optimized for Swiss multilingual requirements.
Q: What is the most cost-effective way to process documents? A: For simple tasks, use Chat, Multi-lingual, Coding & function calling - Small. For specialized OCR, use Document Analysis & OCR - Small (DeepSeek OCR) via API.
Q: Do all models support document upload? A: Yes, all models support text-based document analysis. Models with vision capabilities (e.g., Chat, Vision, Document Analysis & Reasoning - Medium) can also analyze images and visual layouts.
Q: How do I track my costs? A: Your usage dashboard provides a detailed breakdown of token consumption and costs per model.
Get Started
- Log in to your Schatzi AI account.
- Start a new chat in the interface or generate an API key.
- Select your model based on the capabilities listed in this guide.
- Execute your task with the assurance of Swiss data sovereignty.
Need help? Contact Support | View Pricing Plans