Meta: Llama 4 Scout

Provided by OpenRouter

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...

Specifications

Context Length

10,000,000 tokens

Input Price

$0.080/M

Output Price

$0.300/M

Vision Support

Yes

Capabilities

TextVision

About Meta: Llama 4 Scout

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...

Strengths

•Multimodal understanding - can process text and images
•Large context window (10000k tokens) for long conversations

Use Cases

•Image and document understanding
•Content creation and writing assistance
•General conversations and Q&A

Limitations

Performance may vary based on query complexity, context length, and task type. Consider using higher-tier models for production-critical applications.

Sample Prompts

Try these prompts to explore Meta: Llama 4 Scout's capabilities:

Analyze this image and describe what you see in detail

Extract the key information from this screenshot

Compare the two images and explain the differences

Tip: Customize these prompts to fit your specific needs and use cases.

Credits required

Meta: Llama 4 Scout uses tiered credit pricing. Subscribe for a monthly credit allowance, connect your own provider API key (BYOK), or browse lower-cost models on the catalog.

Credit cost per message is shown in the model picker. Economy models typically cost 1 credit; frontier models cost more.

Related Models

Similar models you might be interested in

Qwen: Qwen3.7 Plus

Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...

MiniMax: MiniMax M3

MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding,...

StepFun: Step 3.7 Flash

Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...

xAI: Grok Build 0.1

Grok Build 0.1 is xAI’s fast coding model trained specifically for agentic software engineering workflows. It supports text and image inputs with text output, and is optimized for interactive coding...