GPT-OSS 120B (Infercom)

A high-velocity reasoning engine that bridges the gap between frontier intelligence and open-weight accessibility, optimized for the next generation of autonomous agentic workflows.

About the Model

GPT-OSS 120B is built on a massive Mixture-of-Experts (MoE) architecture containing 117 billion total parameters. To ensure lightning-fast performance, it uses a sparse activation strategy where only 5.1 billion parameters are active for any given token. The "Infercom" variant is specifically tuned for inference engines like vLLM and NVIDIA NIM, utilizing MXFP4 quantization to maintain high intelligence while fitting on a single 80GB GPU (like the H100 or A100).

Model Key Capabilities

  • Adjustable Reasoning Effort:

    Native support for the reasoning_effort parameter, allowing users to toggle between Low (fast/cheap), Medium (balanced), and High (deep analytical thinking).


  • Full Chain-of-Thought (CoT):

    Unlike closed-source models, GPT-OSS provides full transparency into its internal reasoning steps, which is critical for debugging complex agentic workflows.


  • Structured Outputs:

    Optimized for JSON mode and function calling, achieving near-perfect reliability for API-driven agents.


  • High-Speed Throughput:

    Capable of exceeding 500 tokens/sec on optimized inference stacks, making it one of the fastest models in its weight class.

Applications & Use Cases

  • Agentic Workflows:

    Ideally suited as the "brain" for autonomous agents that require real-time web browsing, Python code execution, and multi-step tool use.


  • STEM & Technical Research:

    Exceptional performance in mathematics (AIME 2025: 97.9% with tools) and graduate-level science reasoning (GPQA Diamond: 80.9%).


  • Privacy-Sensitive Production:

    A favorite for legal, financial, and healthcare sectors that require frontier-level reasoning on-premises to ensure data sovereignty.


  • Developer Tooling:

    Perfect for repository-scale code analysis and high-volume synthetic data generation.

Recomended Models based on your needs

Qwen (DeepMask)

Versatile model with reasoning and tool use. Strong at document and image analysis & multilingual chat.

Qwen (DeepMask)

Versatile model with reasoning and tool use. Strong at document and image analysis & multilingual chat.

Qwen3 (StackIT)

Versatile model with reasoning and tool use. Strong at document and image analysis and multilingual chat.

Qwen3 (StackIT)

Versatile model with reasoning and tool use. Strong at document and image analysis and multilingual chat.

Kimi K2 (DeepMask)

Best for deep reasoning and tool use. Ideal for long, multi-step tasks and document analysis.

Kimi K2 (DeepMask)

Best for deep reasoning and tool use. Ideal for long, multi-step tasks and document analysis.

Model Specifications

General


Model Provider

OpenAI

Main Use Cases

High-Speed Agents API Orchestration Coding

Intelligence


Reasoning Effort

Adaptive (Low, Medium, High)

GPQA Diamond

80.9%
Memory


Max Context

131K Tokens
Speed


Latency (TTFT)

0.37s

Throughput

313 - 544 Tokens/sec

Find the Smarter Way to Work With AI

One workspace for all leading AI models. Think faster. Create smarter.

Haiku 4.5

New Chat

Chats

Projects

Recents

Show

Jonas has joined!

How can I help you today?

AI can make mistakes. Please double-check responses.

Models

Qwen (DeepMask)

Kimi K2 (DeepMask)

GPT-OSS 120B (Stack IT)

Haiku 4.5

Gemma 3 27B (Stack IT)

Gemini 2.2 Flash

Gemini 2.5 Flash

GPT-4o

GPT-4.1

Mistral large 2.1

DeepSeek V3

GPT-5.3

Opus 4.5

Sonnet 4.5

GPT-o3 Mini

Grok 3 Mini

Grok 4 Fast

Haiku 4.5

New Chat

Chats

Projects

AI Automation Product

Summer Campaign Research

PR Project Agents

Blog Post Daily Content

Ads Banners on Main Lander

Recents

Show

Jonas Müller

Paid plan

Models

Qwen (DeepMask)

Kimi K2 (DeepMask)

Qwen3 (Stack IT)

GPT 5.2

GPT-OSS 120B (Stack IT)

Haiku 4.5

Gemma 3 27B (Stack IT)

Gemini 2.0 Flash

Gemini 2.5 Flash

GPT-4o

GPT-4.1

Mistral large 2.1

DeepSeek V3

GPT-5.3

Opus 4.5

Sonnet 4.5

GPT-o3 Mini

Grok 3 Mini

Grok 4 Fast

Jonas has joined!

How can I help you today?

AI can make mistakes. Please double-check responses.

Find the Smarter Way to Work With AI

One workspace for all leading AI models. Think faster. Create smarter.

Haiku 4.5

New Chat

Chats

Projects

Recents

Show

Jonas has joined!

How can I help you today?

AI can make mistakes. Please double-check responses.

Models

Qwen (DeepMask)

Kimi K2 (DeepMask)

GPT-OSS 120B (Stack IT)

Haiku 4.5

Gemma 3 27B (Stack IT)

Gemini 2.2 Flash

Gemini 2.5 Flash

GPT-4o

GPT-4.1

Mistral large 2.1

DeepSeek V3

GPT-5.3

Opus 4.5

Sonnet 4.5

GPT-o3 Mini

Grok 3 Mini

Grok 4 Fast

Haiku 4.5

New Chat

Chats

Projects

AI Automation Product

Summer Campaign Research

PR Project Agents

Blog Post Daily Content

Ads Banners on Main Lander

Recents

Show

Jonas Müller

Paid plan

Models

Qwen (DeepMask)

Kimi K2 (DeepMask)

Qwen3 (Stack IT)

GPT 5.2

GPT-OSS 120B (Stack IT)

Haiku 4.5

Gemma 3 27B (Stack IT)

Gemini 2.0 Flash

Gemini 2.5 Flash

GPT-4o

GPT-4.1

Mistral large 2.1

DeepSeek V3

GPT-5.3

Opus 4.5

Sonnet 4.5

GPT-o3 Mini

Grok 3 Mini

Grok 4 Fast

Jonas has joined!

How can I help you today?

AI can make mistakes. Please double-check responses.