Model Profiles
Schatzi AI hosts all models exclusively on Swiss infrastructure, ensuring your data never leaves Switzerland. This page provides detailed technical specifications, capabilities, and pricing for each available model to help you select the optimal solution for your business requirements.
For guidance on selecting the right model for your use case, see /ai-models/choosing-model. To understand how token pricing works, visit /subscription-billing/understanding-tokens.
Apertus Swiss LLM - Large
Description
Designed for organizations requiring maximum transparency and regulatory compliance, this 70B parameter model serves as a reliable foundation for multilingual services, government applications, and research initiatives. With fully documented training methodologies and strict adherence to AI Act requirements, it prioritizes data privacy and intellectual property protection while delivering frontier-level performance.
Specifications
- Context window: 65,536 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: No
- Function calling: No
- Streaming: Yes
- Availability: Chat UI & API
Ideal Use Cases
- Government service automation and citizen support
- Regulatory compliance documentation and reporting
- Academic research and R&D documentation analysis
- Multilingual public sector chatbots
- Legal document review with transparency requirements
- Cross-border administrative processes
Strengths
- Complete training transparency and methodology documentation
- Full AI Act compliance for regulated industries
- Robust multilingual capabilities for European languages
- 70B parameter scale delivering high performance
- Guaranteed Swiss data sovereignty
Limitations
- No vision or image analysis capabilities
- No function calling support for tool integration
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Choose this model when regulatory compliance, training transparency, and data sovereignty are non-negotiable requirements. It is specifically engineered for government agencies, research institutions, and organizations handling sensitive information that must remain within Swiss jurisdiction while maintaining full auditability of AI decision-making processes.
When to Choose a Different Model
For applications requiring vision capabilities or image analysis, use Chat & Document Analysis & Reasoning - Large or Chat, Document Analysis & Agent tasks - Xtra Large. If your workflow requires function calling or autonomous agent capabilities, select Reasoning & Agent tasks - Large or Chat, Document Analysis & Agent tasks - Xtra Large instead.
Apertus Swiss LLM - Small
Description
A streamlined version of the Swiss LLM series, this model is specifically optimized for multilingual dialogue use cases. It provides a cost-effective way to implement conversational AI that remains compliant with Swiss data standards while maintaining high efficiency in dialogue management.
Specifications
- Context window: 65,536 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: No
- Function calling: No
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Lightweight multilingual customer service bots
- Basic conversational interfaces for Swiss SMEs
- Fast-response multilingual chat applications
- Preliminary text screening in multiple languages
- Simple FAQ automation
- Multilingual input classification
Strengths
- Highly cost-efficient for high-volume dialogue
- Maintains Swiss data sovereignty and compliance
- Optimized for conversational flow
- Fast inference speeds
- Reliable multilingual performance
Limitations
- Reduced reasoning depth compared to Large versions
- No vision or multimodal support
- No function calling for external tool integration
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Deploy this model for high-volume, simple conversational tasks where cost efficiency is a priority but Swiss data residency and regulatory compliance remain mandatory. It is ideal for basic chat interfaces that do not require complex reasoning or external tool access.
When to Choose a Different Model
If your application requires deeper analytical capabilities or handles highly complex queries, upgrade to Apertus Swiss LLM - Large. For tasks requiring vision or document analysis, consider Document Analysis - Medium.
Document Analysis - Medium
Description
A compact and efficient vision-language model designed to bridge the gap between text and visual data. This model is optimized for analyzing documents that contain both text and imagery, providing a balanced approach to performance and resource consumption.
Specifications
- Context window: 32,768 tokens
- Max output tokens: Not specified
- Vision support: Yes
- Reasoning mode: No
- Function calling: Yes
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Automated invoice and receipt processing
- Visual data extraction from business forms
- Analysis of technical diagrams and manuals
- Multimodal chat for document support
- Image-to-text conversion for structured data
- Visual quality assurance checks
Strengths
- Integrated vision and language processing
- Supports function calling for structured data output
- Efficient processing of visual documents
- Balanced performance for medium-complexity tasks
- Fast streaming for real-time analysis
Limitations
- Smaller context window than dedicated text models
- Not optimized for long-form creative writing
- Limited deep reasoning capabilities
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Use this model when you need to extract structured information from images or documents and potentially trigger external actions via function calling. It is the ideal choice for automated document processing pipelines where visual understanding is required.
When to Choose a Different Model
For extremely high-volume, simple visual tasks, Document Analysis - Xtra Small may be more cost-effective. For complex reasoning combined with vision, select Chat, Document Analysis & Agent tasks - Xtra Large.
Document Analysis - Small
Description
Optimized for handling text and image input to generate precise text output, this model is designed for multilingual dialogue and document understanding. It provides a versatile solution for businesses needing to process visual information without the overhead of larger models.
Specifications
- Context window: 32,000 tokens
- Max output tokens: Not specified
- Vision support: Yes
- Reasoning mode: No
- Function calling: No
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Multilingual visual chat interfaces
- Basic OCR and document summarization
- Image captioning for accessibility
- Visual content tagging
- Simple document-based Q&A
- Multilingual image-to-text translation
Strengths
- Strong multimodal input handling
- Optimized for multilingual dialogue
- Efficient token usage for visual tasks
- Fast response times
- Reliable text generation from visual cues
Limitations
- No function calling support
- Limited context window for very long documents
- No advanced reasoning capabilities
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Choose this model for straightforward multimodal tasks where you need to describe images or answer questions about documents in multiple languages, but do not need to integrate with external APIs via function calling.
When to Choose a Different Model
If you require function calling to send extracted data to another system, use Document Analysis - Medium. For high-precision OCR of complex tables, use Document Analysis & OCR - Small (DeepSeek OCR).
Document Analysis - Xtra Small
Description
The most lightweight vision-language model in the series, optimized for maximum efficiency. It is designed for compact applications that require basic visual understanding and text generation with minimal latency and cost.
Specifications
- Context window: 16,384 tokens
- Max output tokens: Not specified
- Vision support: Yes
- Reasoning mode: No
- Function calling: No
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- High-speed image classification
- Basic visual tagging for large datasets
- Simple OCR for short snippets of text
- Low-latency visual chat triggers
- Mobile-optimized visual analysis
- Basic document verification
Strengths
- Lowest cost for vision-enabled tasks
- Extremely fast inference and streaming
- Low resource footprint
- Efficient for simple, repetitive visual tasks
- High throughput for batch processing
Limitations
- Very limited context window
- Lowest reasoning capability in the vision suite
- Not suitable for complex document analysis
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Deploy this model for high-volume, low-complexity visual tasks where speed and cost are the primary drivers and the input data is relatively small.
When to Choose a Different Model
For any task requiring complex analysis of a full page or multi-page document, move up to Document Analysis - Small or Document Analysis - Medium.
Fast Reasoning & Instruction Following - Small
Description
A specialized model optimized for high-speed reasoning and strict adherence to complex instructions. It is designed for developers who need a model that can follow precise formatting rules and logical constraints without the latency of larger reasoning models.
Specifications
- Context window: 32,768 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: No
- Function calling: Yes
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Structured data extraction (JSON/XML)
- Strict template-based content generation
- Fast logical validation of text
- Instruction-heavy automation tasks
- API response formatting
- Rapid data analysis and categorization
Strengths
- Exceptional instruction-following accuracy
- Fast reasoning for simple to medium tasks
- Full function calling support
- Reliable structured output
- High efficiency for developer workflows
Limitations
- No vision capabilities
- Not designed for long-form creative writing
- Limited deep "thinking" for highly abstract problems
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Use this model when your primary requirement is that the AI follows a specific set of rules or a strict format perfectly and quickly. It is ideal for the "glue" in an automation pipeline where reliability of format is critical.
When to Choose a Different Model
For tasks requiring deep, multi-step logical deduction, use Reasoning & Problem Solving - Medium. For multimodal tasks, use Document Analysis - Medium.
Reasoning & Problem Solving - Small
Description
An entry-level reasoning model optimized for "thinking" and logical problem solving. It utilizes a reasoning process to work through problems step-by-step, providing more reliable answers for logical queries than standard chat models.
Specifications
- Context window: 32,768 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: Yes
- Function calling: Yes
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Basic mathematical problem solving
- Logical puzzle resolution
- Simple code debugging
- Step-by-step instructional generation
- Basic analytical reasoning
- Logical consistency checking
Strengths
- Native reasoning capabilities
- Cost-effective entry point for "thinking" models
- Supports function calling for tool integration
- Higher accuracy on logical tasks than standard LLMs
- Efficient streaming of reasoning steps
Limitations
- Limited capacity for extremely complex architectural problems
- No vision support
- Smaller context window than document models
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Choose this model for tasks that require a basic level of logical deduction or step-by-step thinking where a standard chat model might hallucinate or skip steps, but where the highest level of reasoning is not required.
When to Choose a Different Model
For highly complex scientific or mathematical problems, upgrade to Reasoning & Problem Solving - Xtra Large. For agentic tasks, use Reasoning & Agent tasks - Large.
Llama 3.3 Multi-lingual - Medium
Description
A powerful, balanced model optimized for high-performance multilingual dialogue. It excels at maintaining conversational context across a wide array of languages, making it a versatile choice for global business communications.
Specifications
- Context window: 131,072 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: No
- Function calling: No
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Global customer support automation
- Multilingual content moderation
- Cross-lingual translation and adaptation
- Large-scale conversational AI for diverse markets
- Multilingual knowledge base interaction
- International business correspondence
Strengths
- Massive 131K context window for long conversations
- Strong performance across multiple languages
- High reliability in dialogue management
- Efficient processing of long text inputs
- Stable and predictable output
Limitations
- No vision or multimodal capabilities
- No function calling for tool integration
- No dedicated reasoning mode
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Deploy this model when you need a reliable, multilingual conversationalist that can handle very long conversation histories or large documents without losing context.
When to Choose a Different Model
If you need the model to interact with external APIs, use Chat & Function Calling - Small (Granite 3.1). For vision tasks, use Document Analysis - Medium.
Llama 4 Maverick multi modal - Small
Description
A cutting-edge multimodal model optimized for seamless experiences across text and visual inputs. It is designed to understand the relationship between images and text, providing a fluid interface for multimodal applications.
Specifications
- Context window: 32,768 tokens
- Max output tokens: Not specified
- Vision support: Yes
- Reasoning mode: No
- Function calling: Yes
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Multimodal AI assistants
- Visual content analysis for social media
- Image-based product support
- Interactive visual storytelling
- Multimodal data entry automation
- Visual Q&A for e-commerce
Strengths
- Native multimodal integration
- Supports function calling for action-oriented tasks
- Fast and responsive streaming
- Strong alignment between visual and textual understanding
- Versatile for a variety of "small" multimodal tasks
Limitations
- Limited context window compared to text-only models
- Not optimized for deep logical reasoning
- Higher output cost than some basic vision models
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Choose this model for modern, interactive applications where the AI needs to "see" and "talk" simultaneously and potentially trigger actions in other software via function calling.
When to Choose a Different Model
For heavy-duty document analysis, use Document Analysis - Medium. For pure reasoning tasks, use Reasoning & Problem Solving - Small.
Reasoning & Agent tasks - Large
Description
A powerhouse for developers building autonomous systems. This model is optimized for powerful reasoning, agentic tasks, and versatile developer use cases, allowing it to plan, execute, and refine complex workflows independently.
Specifications
- Context window: 65,536 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: Yes
- Function calling: Yes
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Autonomous AI agent development
- Complex software engineering tasks
- Multi-step business process automation
- Advanced data analysis and synthesis
- Tool-use orchestration
- Complex logical planning and execution
Strengths
- Advanced reasoning for agentic behavior
- Robust function calling for external tool use
- High reliability in multi-step task execution
- Optimized for developer-centric workflows
- Strong analytical capabilities
Limitations
- No vision support
- Higher cost than basic chat models
- Not optimized for creative, long-form prose
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Select this model when building AI agents that must operate autonomously, use tools, and reason through complex problems to reach a goal. It is the premier choice for "Agentic" AI.
When to Choose a Different Model
If your agent needs to process images or documents, use Chat, Document Analysis & Agent tasks - Xtra Large. For simple chat, Llama 3.3 Multi-lingual - Medium is more efficient.
Reasoning & Problem Solving - Medium
Description
A mid-tier reasoning model that provides a significant boost in logical depth over the Small version. It is optimized for thinking and reasoning, making it suitable for professional-grade analytical tasks.
Specifications
- Context window: 32,768 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: Yes
- Function calling: Yes
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Professional financial analysis
- Complex logical auditing
- Mid-level software architecture planning
- Detailed technical troubleshooting
- Advanced mathematical reasoning
- Strategic planning assistance
Strengths
- Stronger logical deduction than Small reasoning models
- Full function calling support
- Reliable step-by-step thinking
- Balanced speed and depth
- High accuracy on complex logical queries
Limitations
- No vision support
- Limited context window (32K)
- Higher cost than the Small reasoning model
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Use this model for professional analytical tasks where accuracy and logical rigor are paramount, but the extreme scale of the Xtra Large model is not required.
When to Choose a Different Model
For the highest possible reasoning performance, use Reasoning & Problem Solving - Xtra Large. For agentic workflows, use Reasoning & Agent tasks - Large.
Reasoning & Problem Solving - Small (Reasoning Chat)
Description
Optimized specifically for reasoning-based chat completions. This model brings "thinking" capabilities to a smaller, faster footprint, allowing for logical interactions without the latency of larger models.
Specifications
- Context window: 32,768 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: Yes
- Function calling: Yes
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Logic-based customer support
- Interactive tutoring and educational tools
- Simple code explanation and debugging
- Logical consistency checks in chat
- Fast analytical responses
- Step-by-step guide generation
Strengths
- Fast reasoning-enabled chat
- Cost-effective logical processing
- Supports function calling
- Better at logic than standard small models
- Efficient streaming
Limitations
- Not suitable for highly complex architectural reasoning
- No vision support
- Limited context window
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Choose this model for chat applications where users expect logically sound, step-by-step answers but the tasks are of moderate complexity.
When to Choose a Different Model
For deeper analysis, use Reasoning & Problem Solving - Medium. For multimodal reasoning, use Chat, Document Analysis & Agent tasks - Xtra Large.
Reasoning & Problem Solving - Xtra Large
Description
The pinnacle of logical deduction in the portfolio. This model is optimized for the most demanding reasoning chat completions, capable of handling abstract problems and complex logical chains with extreme precision.
Specifications
- Context window: 65,536 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: Yes
- Function calling: Yes
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Advanced scientific research analysis
- Complex legal reasoning and case analysis
- High-level mathematical proofs
- Deep architectural software design
- Strategic corporate planning
- Complex logic-based auditing
Strengths
- Highest level of logical reasoning available
- Capable of handling extremely abstract problems
- High precision in complex deductions
- Robust function calling for tool integration
- Large context window for reasoning tasks
Limitations
- Highest latency among reasoning models
- Premium pricing
- No vision support
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Deploy this model for mission-critical analytical tasks where a failure in logic is unacceptable and the complexity of the problem requires the maximum available "thinking" capacity.
When to Choose a Different Model
If you need vision capabilities alongside reasoning, use Chat, Document Analysis & Agent tasks - Xtra Large. For faster, simpler logic, use Reasoning & Problem Solving - Small.
Chat & Function Calling - Small (Granite 3.1)
Description
Based on the IBM Granite 3.1 8B Instruct architecture, this is a long-context model optimized for instruction following, RAG, and function calling. It is highly efficient and supports 12 languages, including English, German, French, Italian, and Dutch.
Specifications
- Context window: 131,072 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: No
- Function calling: Yes
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- RAG (Retrieval Augmented Generation) pipelines
- Multilingual text extraction and summarization
- API-driven automation
- High-volume instruction following
- Multilingual customer service bots
- Structured data generation
Strengths
- Excellent long-context handling (131K)
- Strong function calling capabilities
- Optimized for RAG workflows
- Broad European language support
- Very cost-effective
Limitations
- No vision support
- No dedicated reasoning mode
- Not designed for complex creative writing
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Use this model for RAG applications or any workflow that requires processing large amounts of text and then calling a function to act on that data, especially in a multilingual European context.
When to Choose a Different Model
For tasks requiring deep reasoning, use Reasoning & Agent tasks - Large. For vision tasks, use Document Analysis - Medium.
Reasoning & Tool Use - Large (GLM-4.5 Air)
Description
A Mixture-of-Experts (MoE) model featuring 106B total parameters. It offers hybrid reasoning with a configurable thinking mode, strong tool/function calling, and exceptional code generation capabilities, all within a generous context window.
Specifications
- Context window: 131,072 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: Yes
- Function calling: Yes
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Advanced code generation and refactoring
- Complex tool-use orchestration
- Hybrid reasoning tasks (fast vs. deep)
- Large-scale technical documentation analysis
- Developer productivity tools
- Complex API integration workflows
Strengths
- Efficient MoE architecture
- Configurable thinking mode for flexibility
- Strong coding and technical capabilities
- Large 128K context window
- Robust function calling
Limitations
- No vision support
- Higher output cost than basic chat models
- Complexity in configuring thinking modes
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Choose this model for technical and developer-centric tasks, particularly those involving coding or complex tool orchestration where a large context window and flexible reasoning are required.
When to Choose a Different Model
For pure reasoning without the MoE complexity, use Reasoning & Problem Solving - Medium. For vision-based agent tasks, use Chat, Document Analysis & Agent tasks - Xtra Large.
Chat & Document Analysis - Xtra Xtra Large
Description
A very large-scale model optimized for multilingual dialogue and document analysis. Note that this model is slated for deprecation soon and should be transitioned to newer alternatives.
Specifications
- Context window: Not specified
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: No
- Function calling: No
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Legacy multilingual chat systems
- Large-scale text analysis (legacy)
- Multilingual dialogue management
- General purpose chat in multiple languages
- Basic document summarization
- High-capacity text processing
Strengths
- Massive scale for general tasks
- Strong multilingual fluency
- Reliable for standard chat interactions
- High throughput for text
Limitations
- Deprecated soon
- No vision support
- No function calling or reasoning mode
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Only use this model for maintaining legacy systems that have not yet been migrated. For all new projects, please select a current model.
When to Choose a Different Model
For all new implementations, use Llama 3.3 Multi-lingual - Medium for chat or Chat & Document Analysis & Reasoning - Large for advanced analysis.
Search, Chat & Analysis - Small
Description
A multimodal model optimized for web search and conversational AI. It is particularly suited for creative professionals, artists, and content creators who need a blend of search capabilities, visual understanding, and storytelling fluency.
Specifications
- Context window: Not specified
- Max output tokens: Not specified
- Vision support: Yes
- Reasoning mode: No
- Function calling: No
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Creative storytelling and narrative design
- Web-based research and synthesis
- Visual content inspiration and brainstorming
- Artistic project planning
- Content creation for marketing
- Image-based research queries
Strengths
- Integrated web search capabilities
- Vision support for visual research
- High creativity and fluency in prose
- Fast streaming for interactive sessions
- Versatile for non-technical creative work
Limitations
- No function calling support
- No dedicated reasoning mode
- Context window not specified
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Choose this model for creative workflows, storytelling, or research tasks that require a combination of web access and visual understanding.
When to Choose a Different Model
For structured business automation, use Chat & Function Calling - Small (Granite 3.1). For deep logical analysis, use Reasoning & Problem Solving - Small.
Chat & Document Analysis & Reasoning - Large
Description
A large-scale model delivering frontier-level performance across a broad range of complex tasks. It combines advanced multilingual capabilities with a reasoning mode that can be enabled to dynamically tailor responses based on the complexity of the query.
Specifications
- Context window: Not specified
- Max output tokens: Not specified
- Vision support: Yes
- Reasoning mode: Yes
- Function calling: Yes
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Complex enterprise document analysis
- High-stakes multilingual business communication
- Advanced reasoning for business strategy
- Visual document auditing
- Complex content validation
- Multimodal executive reporting
Strengths
- Frontier-level performance on complex tasks
- Dynamic reasoning mode
- Integrated vision and function calling
- Exceptional multilingual capabilities
- Versatile across modalities
Limitations
- Higher cost than medium-tier models
- Context window not specified
- Higher latency when reasoning mode is active
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Deploy this model for high-complexity business tasks that require a blend of vision, reasoning, and multilingual fluency, where the highest possible quality of output is required.
When to Choose a Different Model
For autonomous agent workflows, use Chat, Document Analysis & Agent tasks - Xtra Large. For simple, fast chat, use Llama 3.3 Multi-lingual - Medium.
Document Analysis & OCR - Small (DeepSeek OCR)
Description
A specialized 3B parameter vision-language model engineered specifically for optical character recognition (OCR) and document understanding. It excels at converting complex documents into structured text or markdown, including table extraction and mathematical notation.
Specifications
- Context window: 8,192 tokens
- Max output tokens: Not specified
- Vision support: Yes
- Reasoning mode: No
- Function calling: No
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- Converting PDFs/Images to Markdown
- Complex table extraction from documents
- Mathematical formula recognition
- Digitizing handwritten notes
- Structured data extraction from forms
- High-precision OCR for archives
Strengths
- Specialized in OCR and document structure
- Exceptional table and math recognition
- High precision in text extraction
- Efficient 3B parameter size
- Fast streaming of extracted text
Limitations
- Very small context window (8K)
- Not designed for general chat or reasoning
- No function calling support
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Use this model exclusively for OCR and document digitization tasks where the goal is to turn a visual document into a structured text format with high precision.
When to Choose a Different Model
For general document Q&A or chat, use Document Analysis - Medium. For agentic workflows, use Chat, Document Analysis & Agent tasks - Xtra Large.
Chat, Multi-lingual, Coding & function calling - Small
Description
A versatile, high-efficiency model that balances chat fluency, multilingual support, and technical capabilities. It is particularly strong in coding tasks and function calling, making it a reliable choice for developer-centric chat applications.
Specifications
- Context window: 128,000 tokens
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: No
- Function calling: Yes
- Streaming: Yes
- Availability: Chat UI & API
Ideal Use Cases
- Coding assistance and snippet generation
- Multilingual developer chatbots
- API-integrated chat applications
- Technical support automation
- Structured text generation
- Fast multilingual correspondence
Strengths
- Strong coding capabilities
- Full function calling support
- Large 128K context window
- Balanced performance across multiple domains
- Available in both Chat UI and API
Limitations
- No vision support
- No dedicated reasoning mode
- Not optimized for extremely deep logical deduction
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Choose this model for general-purpose technical chat, coding help, or any application that requires a mix of multilingual fluency and the ability to call external functions.
When to Choose a Different Model
For deep reasoning tasks, use Reasoning & Problem Solving - Small. For vision tasks, use Document Analysis - Medium.
Chat, Document Analysis, Coding & Reasoning - Xtra Large
Description
A multimodal powerhouse optimized for the intersection of chat, document analysis, coding, and reasoning. This model is designed for high-complexity technical workflows that require both visual understanding and deep logical processing.
Specifications
- Context window: 1,000,000 tokens
- Max output tokens: Not specified
- Vision support: Yes
- Reasoning mode: Yes
- Function calling: Yes
- Streaming: Yes
- Availability: Chat UI & API
Ideal Use Cases
- Analysis of massive technical codebases
- Complex multimodal data analysis
- Long-form technical reasoning
- Large-scale document auditing with vision
- Advanced software architecture analysis
- Comprehensive data synthesis from mixed sources
Strengths
- Unprecedented 1M token context window
- Full multimodal capabilities (Vision + Text)
- Integrated reasoning and function calling
- Strong coding and data analysis performance
- Available in both Chat UI and API
Limitations
- Higher cost per token
- Higher latency for very large context windows
- May be overkill for simple chat tasks
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Deploy this model when you need to process an enormous amount of information (up to 1M tokens) that includes a mix of code, text, and images, and requires deep reasoning to synthesize the results.
When to Choose a Different Model
For faster, shorter interactions, use Chat, Vision, Document Analysis & Reasoning - Medium. For pure OCR, use Document Analysis & OCR - Small (DeepSeek OCR).
Chat, Vision, Document Analysis & Reasoning - Medium
Description
A best-in-class multimodal model that provides a high-performance balance of vision, coding, and reasoning. It is designed to be the "go-to" versatile model for most professional multimodal applications.
Specifications
- Context window: 256,000 tokens
- Max output tokens: Not specified
- Vision support: Yes
- Reasoning mode: Yes
- Function calling: Yes
- Streaming: Yes
- Availability: Chat UI & API
Ideal Use Cases
- Professional multimodal assistants
- Technical document analysis with reasoning
- Mid-to-large scale coding tasks
- Visual data analysis and reporting
- Complex multilingual chat with vision
- General purpose high-end business AI
Strengths
- Excellent balance of speed and capability
- Strong multimodal and reasoning integration
- Large 256K context window
- Very competitive pricing for its capability tier
- Available in both Chat UI and API
Limitations
- Not as deep as Xtra Large for massive datasets
- Higher cost than basic chat models
- Reasoning can increase latency
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Use this model as your primary multimodal engine for tasks that require a mix of vision, reasoning, and coding, where a 256K context window is sufficient.
When to Choose a Different Model
For the absolute maximum context (1M tokens), use Chat, Document Analysis, Coding & Reasoning - Xtra Large. For simple text chat, use Llama 3.3 Multi-lingual - Medium.
inference-miner-u25
Description
A specialized vision-language model strictly optimized for the technical tasks of document analysis and parsing. It is designed to extract structure and meaning from visual documents with high efficiency.
Specifications
- Context window: Not specified
- Max output tokens: Not specified
- Vision support: No
- Reasoning mode: No
- Function calling: No
- Streaming: Yes
- Availability: API only
Ideal Use Cases
- High-volume document parsing
- Automated data extraction from forms
- Visual structure analysis
- Batch document processing
- Industrial document digitization
- Parsing of standardized business reports
Strengths
- Highly optimized for parsing workflows
- Efficient processing of visual layouts
- Fast streaming for pipeline integration
- Reliable for structured extraction
- Cost-effective for parsing-specific tasks
Limitations
- No general chat capabilities
- No reasoning or function calling
- No vision support (despite being a vision-language model, it is optimized for parsing)
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Deploy this model within a backend pipeline specifically for the purpose of parsing documents and extracting data into a structured format.
When to Choose a Different Model
For any task requiring a conversational interface or reasoning, use Chat, Vision, Document Analysis & Reasoning - Medium. For high-precision OCR, use Document Analysis & OCR - Small (DeepSeek OCR).
Chat, Document Analysis & Agent tasks - Xtra Large
Description
Our most comprehensive model, designed for the most demanding enterprise applications. This very large-scale system combines vision, reasoning, and agentic capabilities with advanced multilingual support, enabling sophisticated automation across complex document workflows and integrated search operations.
Specifications
- Context window: 250,000 tokens
- Max output tokens: Not specified
- Vision support: Yes
- Reasoning mode: Yes
- Function calling: Yes
- Streaming: Yes
- Availability: Chat UI & API
Ideal Use Cases
- End-to-end enterprise automation pipelines
- Complex agent orchestration with visual inputs
- Vision-enabled document analysis and extraction
- Advanced reasoning workflows with tool use
- Multilingual agent deployment at scale
- Integrated search, analysis, and action systems
Strengths
- Comprehensive capability set (Vision, Reasoning, Function Calling)
- Frontier-level performance across all modalities
- Advanced multilingual support for global deployment
- Full agentic functionality for autonomous operations
- Enterprise-grade reliability
Limitations
- Highest cost per token in the portfolio
- Higher latency for complex reasoning tasks
- May be overkill for simple chat
Pricing
- Input: ... per million tokens
- Output: ... per million tokens
When to Use
Select this model for mission-critical applications requiring the full spectrum of AI capabilities—vision, reasoning, and tool use—in a single integrated solution. It is specifically engineered for enterprises building sophisticated automation systems.
When to Choose a Different Model
For cost-sensitive applications not requiring all capabilities, consider Chat & Document Analysis & Reasoning - Large for vision without agentic focus, or Reasoning & Agent tasks - Large for reasoning without vision.
Schatzi AI continuously updates our model library to provide the latest frontier-level performance. Specifications, context windows, and pricing may be adjusted to reflect infrastructure improvements. For the most current comparison, please visit /ai-models/model-comparison.
Pricing is subject to change at our discretion.