Skip to main content

Model Profiles

Schatzi AI hosts all models exclusively on Swiss infrastructure, ensuring your data never leaves Switzerland. This page provides detailed technical specifications, capabilities, and pricing for each available model to help you select the optimal solution for your business requirements.

For guidance on selecting the right model for your use case, see /ai-models/choosing-model. To understand how token pricing works, visit /subscription-billing/understanding-tokens.

Apertus Swiss LLM - Large

Description

Designed for organizations requiring maximum transparency and regulatory compliance, this 70B parameter model serves as a reliable foundation for multilingual services, government applications, and research initiatives. With fully documented training methodologies and strict adherence to AI Act requirements, it prioritizes data privacy and intellectual property protection while delivering frontier-level performance.

Specifications

  • Context window: 65,536 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: No
  • Function calling: No
  • Streaming: Yes
  • Availability: Chat UI & API

Ideal Use Cases

  • Government service automation and citizen support
  • Regulatory compliance documentation and reporting
  • Academic research and R&D documentation analysis
  • Multilingual public sector chatbots
  • Legal document review with transparency requirements
  • Cross-border administrative processes

Strengths

  • Complete training transparency and methodology documentation
  • Full AI Act compliance for regulated industries
  • Robust multilingual capabilities for European languages
  • 70B parameter scale delivering high performance
  • Guaranteed Swiss data sovereignty

Limitations

  • No vision or image analysis capabilities
  • No function calling support for tool integration

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Choose this model when regulatory compliance, training transparency, and data sovereignty are non-negotiable requirements. It is specifically engineered for government agencies, research institutions, and organizations handling sensitive information that must remain within Swiss jurisdiction while maintaining full auditability of AI decision-making processes.

When to Choose a Different Model

For applications requiring vision capabilities or image analysis, use Chat & Document Analysis & Reasoning - Large or Chat, Document Analysis & Agent tasks - Xtra Large. If your workflow requires function calling or autonomous agent capabilities, select Reasoning & Agent tasks - Large or Chat, Document Analysis & Agent tasks - Xtra Large instead.


Apertus Swiss LLM - Small

Description

A streamlined version of the Swiss LLM series, this model is specifically optimized for multilingual dialogue use cases. It provides a cost-effective way to implement conversational AI that remains compliant with Swiss data standards while maintaining high efficiency in dialogue management.

Specifications

  • Context window: 65,536 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: No
  • Function calling: No
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Lightweight multilingual customer service bots
  • Basic conversational interfaces for Swiss SMEs
  • Fast-response multilingual chat applications
  • Preliminary text screening in multiple languages
  • Simple FAQ automation
  • Multilingual input classification

Strengths

  • Highly cost-efficient for high-volume dialogue
  • Maintains Swiss data sovereignty and compliance
  • Optimized for conversational flow
  • Fast inference speeds
  • Reliable multilingual performance

Limitations

  • Reduced reasoning depth compared to Large versions
  • No vision or multimodal support
  • No function calling for external tool integration

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Deploy this model for high-volume, simple conversational tasks where cost efficiency is a priority but Swiss data residency and regulatory compliance remain mandatory. It is ideal for basic chat interfaces that do not require complex reasoning or external tool access.

When to Choose a Different Model

If your application requires deeper analytical capabilities or handles highly complex queries, upgrade to Apertus Swiss LLM - Large. For tasks requiring vision or document analysis, consider Document Analysis - Medium.


Document Analysis - Medium

Description

A compact and efficient vision-language model designed to bridge the gap between text and visual data. This model is optimized for analyzing documents that contain both text and imagery, providing a balanced approach to performance and resource consumption.

Specifications

  • Context window: 32,768 tokens
  • Max output tokens: Not specified
  • Vision support: Yes
  • Reasoning mode: No
  • Function calling: Yes
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Automated invoice and receipt processing
  • Visual data extraction from business forms
  • Analysis of technical diagrams and manuals
  • Multimodal chat for document support
  • Image-to-text conversion for structured data
  • Visual quality assurance checks

Strengths

  • Integrated vision and language processing
  • Supports function calling for structured data output
  • Efficient processing of visual documents
  • Balanced performance for medium-complexity tasks
  • Fast streaming for real-time analysis

Limitations

  • Smaller context window than dedicated text models
  • Not optimized for long-form creative writing
  • Limited deep reasoning capabilities

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Use this model when you need to extract structured information from images or documents and potentially trigger external actions via function calling. It is the ideal choice for automated document processing pipelines where visual understanding is required.

When to Choose a Different Model

For extremely high-volume, simple visual tasks, Document Analysis - Xtra Small may be more cost-effective. For complex reasoning combined with vision, select Chat, Document Analysis & Agent tasks - Xtra Large.


Document Analysis - Small

Description

Optimized for handling text and image input to generate precise text output, this model is designed for multilingual dialogue and document understanding. It provides a versatile solution for businesses needing to process visual information without the overhead of larger models.

Specifications

  • Context window: 32,000 tokens
  • Max output tokens: Not specified
  • Vision support: Yes
  • Reasoning mode: No
  • Function calling: No
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Multilingual visual chat interfaces
  • Basic OCR and document summarization
  • Image captioning for accessibility
  • Visual content tagging
  • Simple document-based Q&A
  • Multilingual image-to-text translation

Strengths

  • Strong multimodal input handling
  • Optimized for multilingual dialogue
  • Efficient token usage for visual tasks
  • Fast response times
  • Reliable text generation from visual cues

Limitations

  • No function calling support
  • Limited context window for very long documents
  • No advanced reasoning capabilities

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Choose this model for straightforward multimodal tasks where you need to describe images or answer questions about documents in multiple languages, but do not need to integrate with external APIs via function calling.

When to Choose a Different Model

If you require function calling to send extracted data to another system, use Document Analysis - Medium. For high-precision OCR of complex tables, use Document Analysis & OCR - Small (DeepSeek OCR).


Document Analysis - Xtra Small

Description

The most lightweight vision-language model in the series, optimized for maximum efficiency. It is designed for compact applications that require basic visual understanding and text generation with minimal latency and cost.

Specifications

  • Context window: 16,384 tokens
  • Max output tokens: Not specified
  • Vision support: Yes
  • Reasoning mode: No
  • Function calling: No
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • High-speed image classification
  • Basic visual tagging for large datasets
  • Simple OCR for short snippets of text
  • Low-latency visual chat triggers
  • Mobile-optimized visual analysis
  • Basic document verification

Strengths

  • Lowest cost for vision-enabled tasks
  • Extremely fast inference and streaming
  • Low resource footprint
  • Efficient for simple, repetitive visual tasks
  • High throughput for batch processing

Limitations

  • Very limited context window
  • Lowest reasoning capability in the vision suite
  • Not suitable for complex document analysis

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Deploy this model for high-volume, low-complexity visual tasks where speed and cost are the primary drivers and the input data is relatively small.

When to Choose a Different Model

For any task requiring complex analysis of a full page or multi-page document, move up to Document Analysis - Small or Document Analysis - Medium.


Fast Reasoning & Instruction Following - Small

Description

A specialized model optimized for high-speed reasoning and strict adherence to complex instructions. It is designed for developers who need a model that can follow precise formatting rules and logical constraints without the latency of larger reasoning models.

Specifications

  • Context window: 32,768 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: No
  • Function calling: Yes
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Structured data extraction (JSON/XML)
  • Strict template-based content generation
  • Fast logical validation of text
  • Instruction-heavy automation tasks
  • API response formatting
  • Rapid data analysis and categorization

Strengths

  • Exceptional instruction-following accuracy
  • Fast reasoning for simple to medium tasks
  • Full function calling support
  • Reliable structured output
  • High efficiency for developer workflows

Limitations

  • No vision capabilities
  • Not designed for long-form creative writing
  • Limited deep "thinking" for highly abstract problems

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Use this model when your primary requirement is that the AI follows a specific set of rules or a strict format perfectly and quickly. It is ideal for the "glue" in an automation pipeline where reliability of format is critical.

When to Choose a Different Model

For tasks requiring deep, multi-step logical deduction, use Reasoning & Problem Solving - Medium. For multimodal tasks, use Document Analysis - Medium.


Reasoning & Problem Solving - Small

Description

An entry-level reasoning model optimized for "thinking" and logical problem solving. It utilizes a reasoning process to work through problems step-by-step, providing more reliable answers for logical queries than standard chat models.

Specifications

  • Context window: 32,768 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: Yes
  • Function calling: Yes
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Basic mathematical problem solving
  • Logical puzzle resolution
  • Simple code debugging
  • Step-by-step instructional generation
  • Basic analytical reasoning
  • Logical consistency checking

Strengths

  • Native reasoning capabilities
  • Cost-effective entry point for "thinking" models
  • Supports function calling for tool integration
  • Higher accuracy on logical tasks than standard LLMs
  • Efficient streaming of reasoning steps

Limitations

  • Limited capacity for extremely complex architectural problems
  • No vision support
  • Smaller context window than document models

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Choose this model for tasks that require a basic level of logical deduction or step-by-step thinking where a standard chat model might hallucinate or skip steps, but where the highest level of reasoning is not required.

When to Choose a Different Model

For highly complex scientific or mathematical problems, upgrade to Reasoning & Problem Solving - Xtra Large. For agentic tasks, use Reasoning & Agent tasks - Large.


Llama 3.3 Multi-lingual - Medium

Description

A powerful, balanced model optimized for high-performance multilingual dialogue. It excels at maintaining conversational context across a wide array of languages, making it a versatile choice for global business communications.

Specifications

  • Context window: 131,072 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: No
  • Function calling: No
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Global customer support automation
  • Multilingual content moderation
  • Cross-lingual translation and adaptation
  • Large-scale conversational AI for diverse markets
  • Multilingual knowledge base interaction
  • International business correspondence

Strengths

  • Massive 131K context window for long conversations
  • Strong performance across multiple languages
  • High reliability in dialogue management
  • Efficient processing of long text inputs
  • Stable and predictable output

Limitations

  • No vision or multimodal capabilities
  • No function calling for tool integration
  • No dedicated reasoning mode

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Deploy this model when you need a reliable, multilingual conversationalist that can handle very long conversation histories or large documents without losing context.

When to Choose a Different Model

If you need the model to interact with external APIs, use Chat & Function Calling - Small (Granite 3.1). For vision tasks, use Document Analysis - Medium.


Llama 4 Maverick multi modal - Small

Description

A cutting-edge multimodal model optimized for seamless experiences across text and visual inputs. It is designed to understand the relationship between images and text, providing a fluid interface for multimodal applications.

Specifications

  • Context window: 32,768 tokens
  • Max output tokens: Not specified
  • Vision support: Yes
  • Reasoning mode: No
  • Function calling: Yes
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Multimodal AI assistants
  • Visual content analysis for social media
  • Image-based product support
  • Interactive visual storytelling
  • Multimodal data entry automation
  • Visual Q&A for e-commerce

Strengths

  • Native multimodal integration
  • Supports function calling for action-oriented tasks
  • Fast and responsive streaming
  • Strong alignment between visual and textual understanding
  • Versatile for a variety of "small" multimodal tasks

Limitations

  • Limited context window compared to text-only models
  • Not optimized for deep logical reasoning
  • Higher output cost than some basic vision models

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Choose this model for modern, interactive applications where the AI needs to "see" and "talk" simultaneously and potentially trigger actions in other software via function calling.

When to Choose a Different Model

For heavy-duty document analysis, use Document Analysis - Medium. For pure reasoning tasks, use Reasoning & Problem Solving - Small.


Reasoning & Agent tasks - Large

Description

A powerhouse for developers building autonomous systems. This model is optimized for powerful reasoning, agentic tasks, and versatile developer use cases, allowing it to plan, execute, and refine complex workflows independently.

Specifications

  • Context window: 65,536 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: Yes
  • Function calling: Yes
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Autonomous AI agent development
  • Complex software engineering tasks
  • Multi-step business process automation
  • Advanced data analysis and synthesis
  • Tool-use orchestration
  • Complex logical planning and execution

Strengths

  • Advanced reasoning for agentic behavior
  • Robust function calling for external tool use
  • High reliability in multi-step task execution
  • Optimized for developer-centric workflows
  • Strong analytical capabilities

Limitations

  • No vision support
  • Higher cost than basic chat models
  • Not optimized for creative, long-form prose

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Select this model when building AI agents that must operate autonomously, use tools, and reason through complex problems to reach a goal. It is the premier choice for "Agentic" AI.

When to Choose a Different Model

If your agent needs to process images or documents, use Chat, Document Analysis & Agent tasks - Xtra Large. For simple chat, Llama 3.3 Multi-lingual - Medium is more efficient.


Reasoning & Problem Solving - Medium

Description

A mid-tier reasoning model that provides a significant boost in logical depth over the Small version. It is optimized for thinking and reasoning, making it suitable for professional-grade analytical tasks.

Specifications

  • Context window: 32,768 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: Yes
  • Function calling: Yes
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Professional financial analysis
  • Complex logical auditing
  • Mid-level software architecture planning
  • Detailed technical troubleshooting
  • Advanced mathematical reasoning
  • Strategic planning assistance

Strengths

  • Stronger logical deduction than Small reasoning models
  • Full function calling support
  • Reliable step-by-step thinking
  • Balanced speed and depth
  • High accuracy on complex logical queries

Limitations

  • No vision support
  • Limited context window (32K)
  • Higher cost than the Small reasoning model

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Use this model for professional analytical tasks where accuracy and logical rigor are paramount, but the extreme scale of the Xtra Large model is not required.

When to Choose a Different Model

For the highest possible reasoning performance, use Reasoning & Problem Solving - Xtra Large. For agentic workflows, use Reasoning & Agent tasks - Large.


Reasoning & Problem Solving - Small (Reasoning Chat)

Description

Optimized specifically for reasoning-based chat completions. This model brings "thinking" capabilities to a smaller, faster footprint, allowing for logical interactions without the latency of larger models.

Specifications

  • Context window: 32,768 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: Yes
  • Function calling: Yes
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Logic-based customer support
  • Interactive tutoring and educational tools
  • Simple code explanation and debugging
  • Logical consistency checks in chat
  • Fast analytical responses
  • Step-by-step guide generation

Strengths

  • Fast reasoning-enabled chat
  • Cost-effective logical processing
  • Supports function calling
  • Better at logic than standard small models
  • Efficient streaming

Limitations

  • Not suitable for highly complex architectural reasoning
  • No vision support
  • Limited context window

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Choose this model for chat applications where users expect logically sound, step-by-step answers but the tasks are of moderate complexity.

When to Choose a Different Model

For deeper analysis, use Reasoning & Problem Solving - Medium. For multimodal reasoning, use Chat, Document Analysis & Agent tasks - Xtra Large.


Reasoning & Problem Solving - Xtra Large

Description

The pinnacle of logical deduction in the portfolio. This model is optimized for the most demanding reasoning chat completions, capable of handling abstract problems and complex logical chains with extreme precision.

Specifications

  • Context window: 65,536 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: Yes
  • Function calling: Yes
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Advanced scientific research analysis
  • Complex legal reasoning and case analysis
  • High-level mathematical proofs
  • Deep architectural software design
  • Strategic corporate planning
  • Complex logic-based auditing

Strengths

  • Highest level of logical reasoning available
  • Capable of handling extremely abstract problems
  • High precision in complex deductions
  • Robust function calling for tool integration
  • Large context window for reasoning tasks

Limitations

  • Highest latency among reasoning models
  • Premium pricing
  • No vision support

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Deploy this model for mission-critical analytical tasks where a failure in logic is unacceptable and the complexity of the problem requires the maximum available "thinking" capacity.

When to Choose a Different Model

If you need vision capabilities alongside reasoning, use Chat, Document Analysis & Agent tasks - Xtra Large. For faster, simpler logic, use Reasoning & Problem Solving - Small.


Chat & Function Calling - Small (Granite 3.1)

Description

Based on the IBM Granite 3.1 8B Instruct architecture, this is a long-context model optimized for instruction following, RAG, and function calling. It is highly efficient and supports 12 languages, including English, German, French, Italian, and Dutch.

Specifications

  • Context window: 131,072 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: No
  • Function calling: Yes
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • RAG (Retrieval Augmented Generation) pipelines
  • Multilingual text extraction and summarization
  • API-driven automation
  • High-volume instruction following
  • Multilingual customer service bots
  • Structured data generation

Strengths

  • Excellent long-context handling (131K)
  • Strong function calling capabilities
  • Optimized for RAG workflows
  • Broad European language support
  • Very cost-effective

Limitations

  • No vision support
  • No dedicated reasoning mode
  • Not designed for complex creative writing

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Use this model for RAG applications or any workflow that requires processing large amounts of text and then calling a function to act on that data, especially in a multilingual European context.

When to Choose a Different Model

For tasks requiring deep reasoning, use Reasoning & Agent tasks - Large. For vision tasks, use Document Analysis - Medium.


Reasoning & Tool Use - Large (GLM-4.5 Air)

Description

A Mixture-of-Experts (MoE) model featuring 106B total parameters. It offers hybrid reasoning with a configurable thinking mode, strong tool/function calling, and exceptional code generation capabilities, all within a generous context window.

Specifications

  • Context window: 131,072 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: Yes
  • Function calling: Yes
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Advanced code generation and refactoring
  • Complex tool-use orchestration
  • Hybrid reasoning tasks (fast vs. deep)
  • Large-scale technical documentation analysis
  • Developer productivity tools
  • Complex API integration workflows

Strengths

  • Efficient MoE architecture
  • Configurable thinking mode for flexibility
  • Strong coding and technical capabilities
  • Large 128K context window
  • Robust function calling

Limitations

  • No vision support
  • Higher output cost than basic chat models
  • Complexity in configuring thinking modes

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Choose this model for technical and developer-centric tasks, particularly those involving coding or complex tool orchestration where a large context window and flexible reasoning are required.

When to Choose a Different Model

For pure reasoning without the MoE complexity, use Reasoning & Problem Solving - Medium. For vision-based agent tasks, use Chat, Document Analysis & Agent tasks - Xtra Large.


Chat & Document Analysis - Xtra Xtra Large

Description

A very large-scale model optimized for multilingual dialogue and document analysis. Note that this model is slated for deprecation soon and should be transitioned to newer alternatives.

Specifications

  • Context window: Not specified
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: No
  • Function calling: No
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Legacy multilingual chat systems
  • Large-scale text analysis (legacy)
  • Multilingual dialogue management
  • General purpose chat in multiple languages
  • Basic document summarization
  • High-capacity text processing

Strengths

  • Massive scale for general tasks
  • Strong multilingual fluency
  • Reliable for standard chat interactions
  • High throughput for text

Limitations

  • Deprecated soon
  • No vision support
  • No function calling or reasoning mode

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Only use this model for maintaining legacy systems that have not yet been migrated. For all new projects, please select a current model.

When to Choose a Different Model

For all new implementations, use Llama 3.3 Multi-lingual - Medium for chat or Chat & Document Analysis & Reasoning - Large for advanced analysis.


Search, Chat & Analysis - Small

Description

A multimodal model optimized for web search and conversational AI. It is particularly suited for creative professionals, artists, and content creators who need a blend of search capabilities, visual understanding, and storytelling fluency.

Specifications

  • Context window: Not specified
  • Max output tokens: Not specified
  • Vision support: Yes
  • Reasoning mode: No
  • Function calling: No
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Creative storytelling and narrative design
  • Web-based research and synthesis
  • Visual content inspiration and brainstorming
  • Artistic project planning
  • Content creation for marketing
  • Image-based research queries

Strengths

  • Integrated web search capabilities
  • Vision support for visual research
  • High creativity and fluency in prose
  • Fast streaming for interactive sessions
  • Versatile for non-technical creative work

Limitations

  • No function calling support
  • No dedicated reasoning mode
  • Context window not specified

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Choose this model for creative workflows, storytelling, or research tasks that require a combination of web access and visual understanding.

When to Choose a Different Model

For structured business automation, use Chat & Function Calling - Small (Granite 3.1). For deep logical analysis, use Reasoning & Problem Solving - Small.


Chat & Document Analysis & Reasoning - Large

Description

A large-scale model delivering frontier-level performance across a broad range of complex tasks. It combines advanced multilingual capabilities with a reasoning mode that can be enabled to dynamically tailor responses based on the complexity of the query.

Specifications

  • Context window: Not specified
  • Max output tokens: Not specified
  • Vision support: Yes
  • Reasoning mode: Yes
  • Function calling: Yes
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Complex enterprise document analysis
  • High-stakes multilingual business communication
  • Advanced reasoning for business strategy
  • Visual document auditing
  • Complex content validation
  • Multimodal executive reporting

Strengths

  • Frontier-level performance on complex tasks
  • Dynamic reasoning mode
  • Integrated vision and function calling
  • Exceptional multilingual capabilities
  • Versatile across modalities

Limitations

  • Higher cost than medium-tier models
  • Context window not specified
  • Higher latency when reasoning mode is active

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Deploy this model for high-complexity business tasks that require a blend of vision, reasoning, and multilingual fluency, where the highest possible quality of output is required.

When to Choose a Different Model

For autonomous agent workflows, use Chat, Document Analysis & Agent tasks - Xtra Large. For simple, fast chat, use Llama 3.3 Multi-lingual - Medium.


Document Analysis & OCR - Small (DeepSeek OCR)

Description

A specialized 3B parameter vision-language model engineered specifically for optical character recognition (OCR) and document understanding. It excels at converting complex documents into structured text or markdown, including table extraction and mathematical notation.

Specifications

  • Context window: 8,192 tokens
  • Max output tokens: Not specified
  • Vision support: Yes
  • Reasoning mode: No
  • Function calling: No
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • Converting PDFs/Images to Markdown
  • Complex table extraction from documents
  • Mathematical formula recognition
  • Digitizing handwritten notes
  • Structured data extraction from forms
  • High-precision OCR for archives

Strengths

  • Specialized in OCR and document structure
  • Exceptional table and math recognition
  • High precision in text extraction
  • Efficient 3B parameter size
  • Fast streaming of extracted text

Limitations

  • Very small context window (8K)
  • Not designed for general chat or reasoning
  • No function calling support

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Use this model exclusively for OCR and document digitization tasks where the goal is to turn a visual document into a structured text format with high precision.

When to Choose a Different Model

For general document Q&A or chat, use Document Analysis - Medium. For agentic workflows, use Chat, Document Analysis & Agent tasks - Xtra Large.


Chat, Multi-lingual, Coding & function calling - Small

Description

A versatile, high-efficiency model that balances chat fluency, multilingual support, and technical capabilities. It is particularly strong in coding tasks and function calling, making it a reliable choice for developer-centric chat applications.

Specifications

  • Context window: 128,000 tokens
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: No
  • Function calling: Yes
  • Streaming: Yes
  • Availability: Chat UI & API

Ideal Use Cases

  • Coding assistance and snippet generation
  • Multilingual developer chatbots
  • API-integrated chat applications
  • Technical support automation
  • Structured text generation
  • Fast multilingual correspondence

Strengths

  • Strong coding capabilities
  • Full function calling support
  • Large 128K context window
  • Balanced performance across multiple domains
  • Available in both Chat UI and API

Limitations

  • No vision support
  • No dedicated reasoning mode
  • Not optimized for extremely deep logical deduction

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Choose this model for general-purpose technical chat, coding help, or any application that requires a mix of multilingual fluency and the ability to call external functions.

When to Choose a Different Model

For deep reasoning tasks, use Reasoning & Problem Solving - Small. For vision tasks, use Document Analysis - Medium.


Chat, Document Analysis, Coding & Reasoning - Xtra Large

Description

A multimodal powerhouse optimized for the intersection of chat, document analysis, coding, and reasoning. This model is designed for high-complexity technical workflows that require both visual understanding and deep logical processing.

Specifications

  • Context window: 1,000,000 tokens
  • Max output tokens: Not specified
  • Vision support: Yes
  • Reasoning mode: Yes
  • Function calling: Yes
  • Streaming: Yes
  • Availability: Chat UI & API

Ideal Use Cases

  • Analysis of massive technical codebases
  • Complex multimodal data analysis
  • Long-form technical reasoning
  • Large-scale document auditing with vision
  • Advanced software architecture analysis
  • Comprehensive data synthesis from mixed sources

Strengths

  • Unprecedented 1M token context window
  • Full multimodal capabilities (Vision + Text)
  • Integrated reasoning and function calling
  • Strong coding and data analysis performance
  • Available in both Chat UI and API

Limitations

  • Higher cost per token
  • Higher latency for very large context windows
  • May be overkill for simple chat tasks

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Deploy this model when you need to process an enormous amount of information (up to 1M tokens) that includes a mix of code, text, and images, and requires deep reasoning to synthesize the results.

When to Choose a Different Model

For faster, shorter interactions, use Chat, Vision, Document Analysis & Reasoning - Medium. For pure OCR, use Document Analysis & OCR - Small (DeepSeek OCR).


Chat, Vision, Document Analysis & Reasoning - Medium

Description

A best-in-class multimodal model that provides a high-performance balance of vision, coding, and reasoning. It is designed to be the "go-to" versatile model for most professional multimodal applications.

Specifications

  • Context window: 256,000 tokens
  • Max output tokens: Not specified
  • Vision support: Yes
  • Reasoning mode: Yes
  • Function calling: Yes
  • Streaming: Yes
  • Availability: Chat UI & API

Ideal Use Cases

  • Professional multimodal assistants
  • Technical document analysis with reasoning
  • Mid-to-large scale coding tasks
  • Visual data analysis and reporting
  • Complex multilingual chat with vision
  • General purpose high-end business AI

Strengths

  • Excellent balance of speed and capability
  • Strong multimodal and reasoning integration
  • Large 256K context window
  • Very competitive pricing for its capability tier
  • Available in both Chat UI and API

Limitations

  • Not as deep as Xtra Large for massive datasets
  • Higher cost than basic chat models
  • Reasoning can increase latency

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Use this model as your primary multimodal engine for tasks that require a mix of vision, reasoning, and coding, where a 256K context window is sufficient.

When to Choose a Different Model

For the absolute maximum context (1M tokens), use Chat, Document Analysis, Coding & Reasoning - Xtra Large. For simple text chat, use Llama 3.3 Multi-lingual - Medium.


inference-miner-u25

Description

A specialized vision-language model strictly optimized for the technical tasks of document analysis and parsing. It is designed to extract structure and meaning from visual documents with high efficiency.

Specifications

  • Context window: Not specified
  • Max output tokens: Not specified
  • Vision support: No
  • Reasoning mode: No
  • Function calling: No
  • Streaming: Yes
  • Availability: API only

Ideal Use Cases

  • High-volume document parsing
  • Automated data extraction from forms
  • Visual structure analysis
  • Batch document processing
  • Industrial document digitization
  • Parsing of standardized business reports

Strengths

  • Highly optimized for parsing workflows
  • Efficient processing of visual layouts
  • Fast streaming for pipeline integration
  • Reliable for structured extraction
  • Cost-effective for parsing-specific tasks

Limitations

  • No general chat capabilities
  • No reasoning or function calling
  • No vision support (despite being a vision-language model, it is optimized for parsing)

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Deploy this model within a backend pipeline specifically for the purpose of parsing documents and extracting data into a structured format.

When to Choose a Different Model

For any task requiring a conversational interface or reasoning, use Chat, Vision, Document Analysis & Reasoning - Medium. For high-precision OCR, use Document Analysis & OCR - Small (DeepSeek OCR).


Chat, Document Analysis & Agent tasks - Xtra Large

Description

Our most comprehensive model, designed for the most demanding enterprise applications. This very large-scale system combines vision, reasoning, and agentic capabilities with advanced multilingual support, enabling sophisticated automation across complex document workflows and integrated search operations.

Specifications

  • Context window: 250,000 tokens
  • Max output tokens: Not specified
  • Vision support: Yes
  • Reasoning mode: Yes
  • Function calling: Yes
  • Streaming: Yes
  • Availability: Chat UI & API

Ideal Use Cases

  • End-to-end enterprise automation pipelines
  • Complex agent orchestration with visual inputs
  • Vision-enabled document analysis and extraction
  • Advanced reasoning workflows with tool use
  • Multilingual agent deployment at scale
  • Integrated search, analysis, and action systems

Strengths

  • Comprehensive capability set (Vision, Reasoning, Function Calling)
  • Frontier-level performance across all modalities
  • Advanced multilingual support for global deployment
  • Full agentic functionality for autonomous operations
  • Enterprise-grade reliability

Limitations

  • Highest cost per token in the portfolio
  • Higher latency for complex reasoning tasks
  • May be overkill for simple chat

Pricing

  • Input: ... per million tokens
  • Output: ... per million tokens

When to Use

Select this model for mission-critical applications requiring the full spectrum of AI capabilities—vision, reasoning, and tool use—in a single integrated solution. It is specifically engineered for enterprises building sophisticated automation systems.

When to Choose a Different Model

For cost-sensitive applications not requiring all capabilities, consider Chat & Document Analysis & Reasoning - Large for vision without agentic focus, or Reasoning & Agent tasks - Large for reasoning without vision.

Model Updates

Schatzi AI continuously updates our model library to provide the latest frontier-level performance. Specifications, context windows, and pricing may be adjusted to reflect infrastructure improvements. For the most current comparison, please visit /ai-models/model-comparison.

Pricing Notice

Pricing is subject to change at our discretion.