Skip to main content

Model Comparison

Schatzi AI provides access to 27 active state-of-the-art AI models. All models run exclusively on Swiss infrastructure, ensuring your data never leaves Switzerland and remains fully compliant with Swiss data protection regulations.

This guide helps you choose the right model for your specific needs, understand pricing, and optimize costs while maintaining complete data sovereignty.


Complete Model Reference

Swiss LLM (AI Act Compliant)

Apertus Swiss LLM - Large

  • Context Window: 65,536 tokens
  • Streaming: Supported

Capabilities:

  • Data and methods documented for unprecedented transparency
  • Compliant with the AI Act and respectful of privacy and intellectual property
  • 70B version delivering performance on a par with current market leaders
  • Ideal for multilingual services, government agencies, and R&D teams

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Apertus Swiss LLM - Large (API only)

  • Context Window: 65,536 tokens
  • Streaming: Supported

Capabilities:

  • Data and methods documented for unprecedented transparency
  • Compliant with the AI Act and respectful of privacy and intellectual property
  • 70B version delivering performance on a par with current market leaders
  • Ideal for multilingual services, government agencies, and R&D teams

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Apertus Swiss LLM - Small (API only)

  • Context Window: 65,536 tokens
  • Streaming: Supported

Capabilities:

  • Optimized for multilingual dialogue use cases

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Vision & Document Analysis

Document Analysis - Medium (API only)

  • Context Window: 32,768 tokens
  • Vision: Supported
  • Function Calling: Supported
  • Streaming: Supported

Capabilities:

  • Optimized as a compact and efficient vision-language model

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Document Analysis - Small (API only)

  • Context Window: 32,768 tokens
  • Vision: Supported
  • Function Calling: Supported
  • Streaming: Supported

Capabilities:

  • Optimized for multilingual dialogue use cases

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Document Analysis - Small (API only)

  • Context Window: 32,000 tokens
  • Vision: Supported
  • Streaming: Supported

Capabilities:

  • Optimized for handling text and image input and generating text output

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Document Analysis - Xtra Small (API only)

  • Context Window: 16,384 tokens
  • Vision: Supported
  • Streaming: Supported

Capabilities:

  • Optimized as a compact and efficient vision-language model

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Llama 4 Maverick multi modal - Small (API only)

  • Context Window: 32,768 tokens
  • Vision: Supported
  • Function Calling: Supported
  • Streaming: Supported

Capabilities:

  • Optimized for text and multimodal experiences

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Document Analysis & OCR - Small (DeepSeek OCR) (API only)

  • Context Window: 8,192 tokens
  • Vision: Supported
  • Streaming: Supported

Capabilities:

  • Specialized for optical character recognition and document understanding
  • Excels at converting documents to structured text/markdown
  • High proficiency in table extraction and mathematical content recognition

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

inference-miner-u25 (API only)

  • Context Window: Variable
  • Vision: Not Supported
  • Streaming: Supported

Capabilities:

  • Vision-language model optimized for document analysis and parsing

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Reasoning & Problem-Solving

Fast Reasoning & Instruction Following - Small (API only)

  • Context Window: 32,768 tokens
  • Function Calling: Supported
  • Streaming: Supported

Capabilities:

  • Optimized for reasoning and instruction-following capabilities

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Reasoning & Problem Solving - Small (API only)

  • Context Window: 32,768 tokens
  • Function Calling: Supported
  • Reasoning: Supported
  • Streaming: Supported

Capabilities:

  • Optimized for thinking and reasoning

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Reasoning & Agent tasks - Large (API only)

  • Context Window: 65,536 tokens
  • Function Calling: Supported
  • Reasoning: Supported
  • Streaming: Supported

Capabilities:

  • Optimized for powerful reasoning and agentic tasks
  • Versatile developer use cases

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Reasoning & Problem Solving - Medium (API only)

  • Context Window: 32,768 tokens
  • Function Calling: Supported
  • Reasoning: Supported
  • Streaming: Supported

Capabilities:

  • Optimized for thinking and reasoning

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Reasoning & Problem Solving - Small (API only)

  • Context Window: 32,768 tokens
  • Function Calling: Supported
  • Reasoning: Supported
  • Streaming: Supported

Capabilities:

  • Optimized for reasoning chat completions

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Reasoning & Problem Solving - Xtra Large (API only)

  • Context Window: 65,536 tokens
  • Function Calling: Supported
  • Reasoning: Supported
  • Streaming: Supported

Capabilities:

  • Optimized for reasoning chat completions

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Reasoning & Problem Solving - Xtra Large (API only)

  • Context Window: Variable
  • Streaming: Supported

Capabilities:

  • Optimized for reasoning chat completions
  • Dedicated reasoning model

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Reasoning & Tool Use - Large (GLM-4.5 Air) (API only)

  • Context Window: 131,072 tokens
  • Function Calling: Supported
  • Reasoning: Supported
  • Streaming: Supported

Capabilities:

  • Mixture-of-Experts architecture
  • Hybrid reasoning with configurable thinking mode
  • Strong tool/function calling and code generation capabilities

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Chat & Document Analysis & Reasoning - Large (API only)

  • Context Window: Variable
  • Vision: Supported
  • Function Calling: Supported
  • Reasoning: Supported
  • Streaming: Supported

Capabilities:

  • Large-scale model delivering frontier-level performance across complex tasks
  • Advanced multilingual capabilities
  • Reasoning mode for dynamic response tailoring based on query complexity

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Chat, Document Analysis, Coding & Reasoning - Xtra Large

  • Context Window: 1,000,000 tokens
  • Vision: Supported
  • Function Calling: Supported
  • Reasoning: Supported
  • Streaming: Supported

Capabilities:

  • Multi-modal model optimized for chat, document analysis, coding, and reasoning

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Chat, Vision, Document Analysis & Reasoning - Medium

  • Context Window: 256,000 tokens
  • Vision: Supported
  • Function Calling: Supported
  • Reasoning: Supported
  • Streaming: Supported

Capabilities:

  • Best-in-class multi-modal model
  • Optimized for chat, vision, document analysis, coding, and reasoning

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Chat, Document Analysis & Agent tasks - Xtra Large

  • Context Window: 250,000 tokens
  • Vision: Supported
  • Function Calling: Supported
  • Reasoning: Supported
  • Streaming: Supported

Capabilities:

  • Very large-scale model delivering frontier-level performance across complex tasks
  • Advanced multilingual capabilities
  • Reasoning mode for dynamic response tailoring
  • Optimized for powerful reasoning, agentic tasks, and versatile developer use cases

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Multilingual

Llama 3.3 Multi-lingual - Medium (API only)

  • Context Window: 131,072 tokens
  • Streaming: Supported

Capabilities:

  • Optimized for multilingual dialogue use cases

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Chat & Function Calling - Small (Granite 3.1) (API only)

  • Context Window: 131,072 tokens
  • Function Calling: Supported
  • Streaming: Supported

Capabilities:

  • Long-context model optimized for instruction following, RAG, summarization, and text extraction
  • Supports 12 languages including English, German, French, Italian, and Dutch

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Chat & Document Analysis - Xtra Xtra Large (API only)

  • Context Window: Variable
  • Streaming: Supported

Capabilities:

  • Optimized for multilingual dialogue use cases
  • Note: This model will be deprecated soon

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Chat, Multi-lingual, Coding & function calling - Small

  • Context Window: 128,000 tokens
  • Function Calling: Supported
  • Streaming: Supported

Capabilities:

  • Versatile small model optimized for chat, coding, and multilingual tasks

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Chat & General Purpose

Search, Chat & Analysis - Small (API only)

  • Context Window: Variable
  • Vision: Supported
  • Streaming: Supported

Capabilities:

  • Optimized for web search and chat
  • Suitable for artists and content creation, including storytelling

Pricing:

  • Input: ... per million tokens
  • Output: ... per million tokens

Complete Pricing Overview

Loading prices...

Pricing Notice

Pricing is subject to change at our discretion.


Cost Estimation

Typical Task Types

Task TypeToken UsageRecommended Model
Email response500 input + 300 outputChat, Multi-lingual, Coding & function calling - Small
10-page document summary10K input + 1K outputChat, Vision, Document Analysis & Reasoning - Medium
Contract analysis (30 pages)30K input + 2K outputApertus Swiss LLM - Large
Complex reasoning task5K input + 3K outputReasoning & Agent tasks - Large
Multilingual exchange15K input + 10K outputChat, Document Analysis & Agent tasks - Xtra Large

Model Availability

Schatzi AI offers models across two channels:

  • Chat UI (via chat.schatziai.ch / OpenWebUI): Access to models with all availability.
  • API (via REST API with API keys): Access to all models, including those marked as api only.

Chat UI Models

The following models are available directly in the Chat interface:

  • Apertus Swiss LLM - Large
  • Chat, Multi-lingual, Coding & function calling - Small
  • Chat, Document Analysis, Coding & Reasoning - Xtra Large
  • Chat, Vision, Document Analysis & Reasoning - Medium
  • Chat, Document Analysis & Agent tasks - Xtra Large

API-Only Models

These models are only accessible via API key and are not visible in the Chat UI:

  • Apertus Swiss LLM - Large (API only)
  • Apertus Swiss LLM - Small
  • Document Analysis - Medium
  • Document Analysis - Small (Multiple versions)
  • Document Analysis - Xtra Small
  • Llama 4 Maverick multi modal - Small
  • Document Analysis & OCR - Small (DeepSeek OCR)
  • inference-miner-u25
  • Fast Reasoning & Instruction Following - Small
  • Reasoning & Problem Solving - Small (Multiple versions)
  • Reasoning & Agent tasks - Large
  • Reasoning & Problem Solving - Medium
  • Reasoning & Problem Solving - Xtra Large (Multiple versions)
  • Reasoning & Tool Use - Large (GLM-4.5 Air)
  • Chat & Document Analysis & Reasoning - Large
  • Llama 3.3 Multi-lingual - Medium
  • Chat & Function Calling - Small (Granite 3.1)
  • Chat & Document Analysis - Xtra Xtra Large
  • Search, Chat & Analysis - Small

For the full technical reference, please visit Available API Models.



FAQ

Q: Can I switch models mid-conversation? A: Yes. You can change models at any time. The conversation context carries over, though very long contexts may be truncated if the new model has a smaller context window.

Q: Which model is best for Swiss legal documents? A: Apertus Swiss LLM - Large is specifically designed for this use case. It is AI Act compliant and optimized for Swiss multilingual requirements.

Q: What is the most cost-effective way to process documents? A: For simple tasks, use Chat, Multi-lingual, Coding & function calling - Small. For specialized OCR, use Document Analysis & OCR - Small (DeepSeek OCR) via API.

Q: Do all models support document upload? A: Yes, all models support text-based document analysis. Models with vision capabilities (e.g., Chat, Vision, Document Analysis & Reasoning - Medium) can also analyze images and visual layouts.

Q: How do I track my costs? A: Your usage dashboard provides a detailed breakdown of token consumption and costs per model.


Get Started

  1. Log in to your Schatzi AI account.
  2. Start a new chat in the interface or generate an API key.
  3. Select your model based on the capabilities listed in this guide.
  4. Execute your task with the assurance of Swiss data sovereignty.

Need help? Contact Support | View Pricing Plans