GPT-OSS 120B (Infercom)

A high-velocity reasoning engine that bridges the gap between frontier intelligence and open-weight accessibility, optimized for the next generation of autonomous agentic workflows.

Try Now

About the Model

GPT-OSS 120B is built on a massive Mixture-of-Experts (MoE) architecture containing 117 billion total parameters. To ensure lightning-fast performance, it uses a sparse activation strategy where only 5.1 billion parameters are active for any given token. The "Infercom" variant is specifically tuned for inference engines like vLLM and NVIDIA NIM, utilizing MXFP4 quantization to maintain high intelligence while fitting on a single 80GB GPU (like the H100 or A100).

Model Key Capabilities

Adjustable Reasoning Effort:
Native support for the reasoning_effort parameter, allowing users to toggle between Low (fast/cheap), Medium (balanced), and High (deep analytical thinking).
Full Chain-of-Thought (CoT):
Unlike closed-source models, GPT-OSS provides full transparency into its internal reasoning steps, which is critical for debugging complex agentic workflows.
Structured Outputs:
Optimized for JSON mode and function calling, achieving near-perfect reliability for API-driven agents.
High-Speed Throughput:
Capable of exceeding 500 tokens/sec on optimized inference stacks, making it one of the fastest models in its weight class.

Applications & Use Cases

Agentic Workflows:
Ideally suited as the "brain" for autonomous agents that require real-time web browsing, Python code execution, and multi-step tool use.
STEM & Technical Research:
Exceptional performance in mathematics (AIME 2025: 97.9% with tools) and graduate-level science reasoning (GPQA Diamond: 80.9%).
Privacy-Sensitive Production:
A favorite for legal, financial, and healthcare sectors that require frontier-level reasoning on-premises to ensure data sovereignty.
Developer Tooling:
Perfect for repository-scale code analysis and high-volume synthetic data generation.

‹ Kimi K2.5

GPT-OSS 120B (StackIT) ›

Recomended Models based on your needs

Qwen (DeepMask)

Versatile model with reasoning and tool use. Strong at document and image analysis & multilingual chat.

Qwen (DeepMask)

Versatile model with reasoning and tool use. Strong at document and image analysis & multilingual chat.

Qwen3 (StackIT)

Versatile model with reasoning and tool use. Strong at document and image analysis and multilingual chat.

Qwen3 (StackIT)

Versatile model with reasoning and tool use. Strong at document and image analysis and multilingual chat.

Kimi K2 (DeepMask)

Best for deep reasoning and tool use. Ideal for long, multi-step tasks and document analysis.

Kimi K2 (DeepMask)

Best for deep reasoning and tool use. Ideal for long, multi-step tasks and document analysis.

Model Specifications

General
Model Provider	OpenAI
Main Use Cases	`High-Speed Agents` `API Orchestration` `Coding`
Intelligence
Reasoning Effort	Adaptive (Low, Medium, High)
GPQA Diamond	80.9%
Memory
Max Context	131K Tokens
Speed
Latency (TTFT)	0.37s
Throughput	313 - 544 Tokens/sec