{"object":"list","data":[{"id":"openai/o3","name":"o3","description":"o3 is a versatile, high-performing model across many domains. It raises the bar in math, science, coding, and visual reasoning, and it’s excellent at technical writing and following instructions. Use it to tackle multi-step problems that combine text, code, and images.","context_length":200000,"max_tokens":0,"pricing":{"input_cost_per_token":0.000002,"output_cost_per_token":0.000008},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/gpt-4o","name":"GPT 4o","description":"GPT-4o is OpenAI’s multimodal model, handling both text and image inputs with text outputs. It matches the intelligence of GPT-4 Turbo but runs twice as fast at half the cost. The model also brings stronger non-English language support and improved visual understanding.","context_length":128000,"max_tokens":0,"pricing":{"input_cost_per_token":0.0000025,"output_cost_per_token":0.00001,"base_image_tokens":85},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/gpt-4","name":"GPT 4","description":"The default GPT-4 model with an 8,192-token context window.","context_length":8192,"max_tokens":0,"pricing":{"input_cost_per_token":0.00003,"output_cost_per_token":0.00006},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"google/gemini-2.5-flash-lite-preview-06-17","name":"Gemini 2.5 Flash Lite","description":"Gemini 2.5 Flash-Lite is a model developed by Google DeepMind, designed to handle various tasks including reasoning, science, mathematics, code generation, and more. It features advanced capabilities in multilingual performance and long context understanding. It is optimized for low latency use cases, supporting multimodal input with a 1 million-token context length.\n\n","context_length":1048576,"max_tokens":0,"pricing":{"input_cost_per_token":1e-7,"output_cost_per_token":4e-7},"provider":{"name":"Google"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/o3-mini","name":"o3 mini","description":"OpenAI o3-mini is a lightweight, cost-efficient model built for STEM reasoning tasks like math, science, and coding. It lets you adjust its reasoning effort to balance speed and depth.The model delivers strong accuracy, matching the larger o1 on tough benchmarks while running faster and cheaper.","context_length":200000,"max_tokens":0,"pricing":{"input_cost_per_token":0.0000011,"output_cost_per_token":0.0000044},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"anthropic/claude-haiku-4-5-20251001","name":"Claude Haiku 4.5","description":"Claude Haiku 4.5 delivers near-frontier performance with exceptional speed and cost-efficiency. It excels at real-time tasks like chat assistants, coding, and multi-agent workflows. Use it alone or alongside Sonnet 4.5 for fast, scalable execution.","context_length":200000,"max_tokens":64000,"pricing":{"input_cost_per_token":0.000001,"output_cost_per_token":0.000005,"base_image_tokens":85},"provider":{"name":"Anthropic"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"anthropic/claude-sonnet-4-5-20250929","name":"Claude Sonnet 4.5","description":"Claude Sonnet 4.5 is a frontier AI model that excels at coding, creating complex agents, and using computers for real-world tasks. It also features significant improvements in reasoning, math, and alignment, making it the most powerful model in its series.","context_length":200000,"max_tokens":0,"pricing":{"input_cost_per_token":0.000003,"output_cost_per_token":0.000015,"base_image_tokens":85},"provider":{"name":"Anthropic"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"anthropic/claude-opus-4-5-20251101","name":"Claude Opus 4.5","description":"Anthropic's Premium model combining maximum intelligence with practical performance","context_length":200000,"max_tokens":0,"pricing":{"input_cost_per_token":0.000005,"output_cost_per_token":0.000025},"provider":{"name":"Anthropic"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"anthropic/claude-sonnet-4-20250514","name":"Claude Sonnet 4","description":"Claude Sonnet 4 improves on Sonnet 3.7 with stronger coding and reasoning skills, better precision, and greater reliability. It scores 72.7% on SWE-bench while staying efficient, making it ideal for both everyday coding and complex software projects.\n","context_length":200000,"max_tokens":0,"pricing":{"input_cost_per_token":0.000003,"output_cost_per_token":0.000015,"base_image_tokens":85},"provider":{"name":"Anthropic"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/gpt-5","name":"GPT 5","description":"GPT-5 is OpenAI model that is built for complex, step-by-step tasks that demand precise instruction following and high-stakes accuracy. It supports test-time routing plus prompts like “think hard about this.” It also reduces hallucinations and sycophancy while improving performance in coding, writing, and health-related tasks.\n","context_length":400000,"max_tokens":0,"pricing":{"input_cost_per_token":0.00000125,"output_cost_per_token":0.00001},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"lightning-ai/gpt-oss-120b","name":"gpt-oss-120b","description":"gpt-oss-120B is a 117 billion parameter language model, using a mixture-of-experts approach but activating only 5.1 billion per token for efficiency. It supports long contexts of up to 128k tokens, enabling it to handle extended conversations or documents smoothly. The model performs nearly at the level of o4-mini on reasoning tasks and surpasses many other open models in quality.","context_length":128000,"max_tokens":131072,"pricing":{"input_cost_per_token":2.5e-8,"output_cost_per_token":1e-7},"provider":{"name":"lightning-ai"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"openai/gpt-5-mini","name":"GPT 5 mini","description":"GPT-5 Mini is a smaller, more efficient variant of GPT-5 built for lighter reasoning tasks. It maintains the strong instruction-following and safety features of GPT-5 while offering faster responses and lower costs.","context_length":400000,"max_tokens":0,"pricing":{"input_cost_per_token":2.5e-7,"output_cost_per_token":0.000002},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"openai/gpt-5-nano","name":"GPT 5 nano","description":"GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, built as a unified system that can decide when to respond quickly or dig deeper depending on what you ask it. It gets much better across a wide range of skills — writing, coding, math, health, and visual tasks — and is more reliable and accurate in real-world scenarios. Compared to earlier models, it hallucinates less, follows instructions more faithfully, and understands the context better.","context_length":400000,"max_tokens":0,"pricing":{"input_cost_per_token":5e-8,"output_cost_per_token":4e-7},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"anthropic/claude-opus-4-1-20250805","name":"Claude Opus 4.1","description":"Claude Opus 4.1 is an enhanced version of Anthropic’s flagship model, with stronger coding, reasoning, and agentic capabilities. It scores 74.5% on SWE-bench Verified and improves at multi-file refactoring, debugging, and detailed reasoning. With support for extended thinking up to 64K tokens, it’s well-suited for research, data analysis, and tool-assisted problem solving.","context_length":200000,"max_tokens":32000,"pricing":{"input_cost_per_token":0.000015,"output_cost_per_token":0.000075,"base_image_tokens":85},"provider":{"name":"Anthropic"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"google/gemini-2.5-pro","name":"Gemini 2.5 Pro","description":"Gemini 2.5 Pro is Google’s AI model built for advanced reasoning, coding, math, and scientific work. With integrated “thinking” capabilities, it delivers more accurate answers and handles context with greater nuance. It ranks at the top of multiple benchmarks, including first place on the LMArena leaderboard, showcasing strong human-preference alignment and exceptional problem-solving skills.","context_length":1048576,"max_tokens":0,"pricing":{"input_cost_per_token":0.00000125,"output_cost_per_token":0.00001},"provider":{"name":"Google"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"google/gemini-2.5-flash","name":"Gemini 2.5 Flash","description":"Gemini 2.5 Flash is Google’s powerful model built for complex reasoning, coding, math, and scientific challenges. With integrated “thinking” capabilities, it delivers more accurate answers and handles context with greater nuance.","context_length":1048576,"max_tokens":0,"pricing":{"input_cost_per_token":3e-7,"output_cost_per_token":0.0000025},"provider":{"name":"Google"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/gpt-5.2-2025-12-11","name":"GPT 5.2","description":"GPT-5.2 is OpenAI's flagship model for coding and agentic tasks across industries. ","context_length":400000,"max_tokens":0,"pricing":{"input_cost_per_token":0.00000175,"output_cost_per_token":0.000014},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/gpt-4.1","name":"GPT 4.1","description":"OpenAI’s fast and capable model for reasoning, coding, and chat. It responds quickly, supports long context (128k tokens), and runs efficiently at scale—ideal for advanced API applications.","context_length":1047576,"max_tokens":0,"pricing":{"input_cost_per_token":0.000002,"output_cost_per_token":0.000008},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/gpt-4-turbo-preview","name":"Lightning SDK expert","description":"","context_length":128000,"max_tokens":0,"pricing":{"input_cost_per_token":0.00001,"output_cost_per_token":0.00003},"provider":{},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"openai/gpt-4-turbo-preview","name":"LitLogger helper","description":"","context_length":128000,"max_tokens":0,"pricing":{"input_cost_per_token":0.00001,"output_cost_per_token":0.00003},"provider":{},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"lightning-ai/minimax-m2.5","name":"minimax-m2.5","description":"Minimax-M2.5 is a multimodal foundation model from MiniMax designed to handle text, images, and video understanding with strong reasoning and long-context capabilities. It is optimized for large-scale inference efficiency and is commonly used for chat, multimodal analysis, and agent-style workflows in production AI systems.","context_length":196000,"max_tokens":0,"pricing":{"input_cost_per_token":2.5e-7,"output_cost_per_token":0.0000012},"provider":{"name":"lightning-ai"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"anthropic/claude-opus-4-6","name":"Claude Opus 4.6","description":"Claude Opus 4.6 is Anthropic’s most intelligent model for building agents and coding","context_length":200000,"max_tokens":0,"pricing":{"input_cost_per_token":0.000005,"output_cost_per_token":0.000025,"base_image_tokens":85},"provider":{"name":"Anthropic"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"lightning-ai/glm-5","name":"glm-5","description":"GLM-5 is a large language model developed by Zhipu AI in the GLM (General Language Model) family, designed for advanced reasoning, coding, and multilingual conversation. It supports long-context understanding and tool-use capabilities, making it suitable for agent workflows, enterprise assistants, and complex problem-solving tasks.","context_length":200000,"max_tokens":0,"pricing":{"input_cost_per_token":9e-7,"output_cost_per_token":0.0000032},"provider":{"name":"lightning-ai"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"google/gemini-3-flash-preview","name":"Gemini 3 Flash","description":"Gemini 3 Flash Preview is designed to deliver strong agentic capabilities (near-Pro level) at substantial speed and value.","context_length":1048576,"max_tokens":0,"pricing":{"input_cost_per_token":5e-7,"output_cost_per_token":0.000003},"provider":{"name":"Google"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"anthropic/claude-sonnet-4-6","name":"Claude Sonnet 4.6","description":"Claude Sonnet 4.6 is Anthropic's most capable Sonnet model yet, featuring major upgrades in coding, computer use, long-context reasoning, and agentic tasks — approaching Opus-level performance at Sonnet pricing.","context_length":200000,"max_tokens":0,"pricing":{"input_cost_per_token":0.000003,"output_cost_per_token":0.000015,"base_image_tokens":85},"provider":{"name":"Anthropic"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/gpt-4-turbo","name":"GPT 4 turbo","description":"GPT-4 Turbo is an upgraded version of GPT-4, designed to deliver the same high intelligence while being faster and more cost-effective.","context_length":128000,"max_tokens":0,"pricing":{"input_cost_per_token":0.00001,"output_cost_per_token":0.00003},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"anthropic/claude-opus-4-20250514","name":"Claude Opus 4","description":"Claude Opus 4 is Anthropic’s coding model, built for complex, long-running tasks and agent workflows. It leads in software engineering benchmarks with 72.5% on SWE-bench and 43.2% on Terminal-bench. The model is designed for extended use, sustaining thousands of task steps over hours without performance drop.","context_length":200000,"max_tokens":0,"pricing":{"input_cost_per_token":0.000015,"output_cost_per_token":0.000075,"base_image_tokens":85},"provider":{"name":"Anthropic"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/gpt-3.5-turbo","name":"GPT 3.5 turbo","description":"GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks.","context_length":16385,"max_tokens":0,"pricing":{"input_cost_per_token":5e-7,"output_cost_per_token":0.0000015},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"lightning-ai/nemotron-3-super-120b-a12b","name":"nemotron-3-super-120b-a12b","description":"NVIDIA Nemotron 3 Super 120B is a high-accuracy, large-scale language model within the Nemotron 3 family designed for advanced multi-agent applications.","context_length":256000,"max_tokens":0,"pricing":{"input_cost_per_token":3.5e-7,"output_cost_per_token":7.5e-7},"provider":{"name":"lightning-ai"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"google/gemini-3.1-pro-preview","name":"Gemini 3.1 Pro","description":"Gemini 3.1 Pro is Google's most advanced reasoning model, delivering a major leap in core intelligence and multimodal understanding for complex agentic, coding, and long-context tasks.","context_length":1048576,"max_tokens":0,"pricing":{"input_cost_per_token":0.000002,"output_cost_per_token":0.000012},"provider":{"name":"Google"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/gpt-4.1","name":"Capex and GPU Financing News","description":"OpenAI’s fast and capable model for reasoning, coding, and chat. It responds quickly, supports long context (128k tokens), and runs efficiently at scale—ideal for advanced API applications.","context_length":1047576,"max_tokens":0,"pricing":{"input_cost_per_token":0.000002,"output_cost_per_token":0.000008},"provider":{},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"lightning-ai/gemma-4-31B-it","name":"gemma-4-31B-it","description":"Gemma 4 is an open-weight, multimodal model family built by Google DeepMind, supporting text, image, and audio inputs with strong reasoning and coding capabilities.\nIt spans efficient small models to large dense and mixture-of-experts architectures, offering long context windows and scalable deployment from edge devices to high-performance systems.","context_length":131072,"max_tokens":0,"pricing":{"input_cost_per_token":1.4e-7,"output_cost_per_token":4e-7},"provider":{"name":"lightning-ai"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"openai/gpt-5.4-2026-03-05","name":"GPT 5.4","description":"OpenAI frontier model for complex professional work","context_length":1050000,"max_tokens":0,"pricing":{"input_cost_per_token":0.0000025,"output_cost_per_token":0.000015},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"lightning-ai/kimi-k2.5","name":"kimi-k2.5","description":"Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. It seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.","context_length":256000,"max_tokens":0,"pricing":{"input_cost_per_token":0.0000011,"output_cost_per_token":0.0000025},"provider":{"name":"lightning-ai"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"anthropic/claude-opus-4-7","name":"Claude Opus 4.7","description":"Anthropic's most capable generally available model to date. It is highly autonomous and performs exceptionally well on long-horizon agentic work, knowledge work, vision tasks, and memory tasks. This page summarizes everything new at launch.","context_length":1000000,"max_tokens":0,"pricing":{"input_cost_per_token":0.000005,"output_cost_per_token":0.000025},"provider":{"name":"Anthropic"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"google/gemini-3.1-flash-lite-preview","name":"gemini-3.1-flash-lite-preview","description":"Gemini 3.1 Flash-Lite Preview is Google's most cost-efficient model, optimized for high-volume agentic tasks, translation, and simple data processing","context_length":1048576,"max_tokens":0,"pricing":{"input_cost_per_token":2.5e-7,"output_cost_per_token":0.0000015},"provider":{"name":"Google"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"lightning-ai/deepseek-v4-pro","name":"deepseek-v4-pro","description":"DeepSeek-V4 is a next-generation Mixture-of-Experts large language model engineered for exceptional reasoning precision and scalable efficiency, combining 1-million-token context handling, advanced tool-use orchestration, and a deeply integrated chain-of-thought framework to deliver enterprise-grade AI performance.","context_length":1048576,"max_tokens":0,"pricing":{"input_cost_per_token":8.9e-7,"output_cost_per_token":0.000003},"provider":{"name":"lightning-ai"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"lightning-ai/nemotron-3-nano-omni-30b-a3b-reasoning","name":"nemotron-3-nano-omni-30b-a3b-reasoning","description":"NVIDIA Nemotron 3 Nano Omni is an open multimodal model with\nhighest efficiency that powers sub-agents to complete tasks faster across vision, audio, and\nlanguage.","context_length":256000,"max_tokens":0,"pricing":{"input_cost_per_token":2.5e-7,"output_cost_per_token":5e-7},"provider":{"name":"lightning-ai"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"lightning-ai/gpt-oss-20b","name":"gpt-oss-20b","description":"gpt-oss-20B is a 21-billion parameter language model built with a mixture-of-experts design that activates only about 3.6 billion parameters per token. This efficiency allows it to run on devices with as little as 16 GB of memory, making it well-suited for local setups or edge devices.","context_length":128000,"max_tokens":0,"pricing":{"input_cost_per_token":1.25e-8,"output_cost_per_token":5e-8},"provider":{"name":"lightning-ai"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}},{"id":"openai/gpt-5.4-nano-2026-03-17","name":"GPT 5.4 nano","description":"GPT-5.4 nano is designed for tasks where speed and cost matter most like classification, data extraction, ranking, and sub-agents","context_length":400000,"max_tokens":0,"pricing":{"input_cost_per_token":2.5e-7,"output_cost_per_token":0.00000125},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/gpt-5.5-2026-04-23","name":"GPT 5.5","description":"OpenAI newest frontier model for the most complex professional work","context_length":1050000,"max_tokens":0,"pricing":{"input_cost_per_token":0.000005,"output_cost_per_token":0.00003},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"openai/gpt-5.4-mini-2026-03-17","name":"GPT 5.4 mini","description":"GPT-5.4 mini brings the strengths of GPT-5.4 to a faster, more efficient model designed for high-volume workloads","context_length":400000,"max_tokens":0,"pricing":{"input_cost_per_token":7.5e-7,"output_cost_per_token":0.0000045},"provider":{"name":"OpenAI"},"architecture":{"input_modalities":["text","image"],"output_modalities":["text"]}},{"id":"google/gemini-3.5-flash","name":"Gemini 3.5 Flash","description":"Google's most intelligent Flash model, Gemini 3.5 Flash delivers sustained frontier performance in agentic execution, coding, and long-horizon tasks at scale.","context_length":1048576,"max_tokens":0,"pricing":{"input_cost_per_token":0.0000015,"output_cost_per_token":0.000009},"provider":{"name":"Google"},"architecture":{"input_modalities":["text"],"output_modalities":["text"]}}]}