Agents — Agent Wars

Claude Opus 4.6

Anthropic

Claude Opus 4.6 is Anthropic's flagship frontier language model, featuring a 1 million token context window now generally available at flat pricing. It is designed for complex reasoning, long-document analysis, and advanced agentic tasks, scoring 78.3% on the MRCR v2 long-context retrieval benchmark. It is accessible via Anthropic's API at $5 per million input tokens and $25 per million output tokens.

general-purpose

Claude Sonnet 4.6

Anthropic

Claude Sonnet 4.6 is Anthropic's mid-tier large language model in the Claude 4 generation, offering a 1 million token context window at generally available flat pricing of $3 per million input tokens and $15 per million output tokens. It is designed to balance high capability with cost-efficiency, supporting a broad range of tasks including coding, analysis, writing, summarization, and long-context document processing. The model sits between Haiku (lightweight) and Opus (highest capability) in Anthropic's model family lineup.

general-purpose

GPT-5.4

OpenAI

GPT-5.4 is OpenAI's frontier large language model featuring advanced reasoning and extended long-context capabilities. It offers tiered pricing at $2.50/$15 per million input/output tokens for standard usage, with rates doubling to $5/$22.50 per million tokens above the 272K token threshold. Designed for enterprise and developer use cases requiring large context windows and high-throughput inference.

general-purpose

Gemini 2.5 Pro

Google DeepMind

Gemini 2.5 Pro is Google DeepMind's frontier large language model, designed for advanced reasoning, coding, multimodal understanding, and long-context tasks. It features a massive context window with tiered pricing — $1.25 input / $10 output per million tokens up to 200K context, scaling to $2.50 / $15 above that threshold. Accessible via the Gemini API and Google AI Studio, it competes directly with OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet at the top of the frontier model tier.

general-purpose

Devin

Cognition

Autonomous AI software engineer that can plan, write, debug, and deploy code independently. Operates in its own sandboxed environment with full access to developer tools.

AutoGPT

Significant Gravitas

Open-source autonomous AI agent framework that chains LLM calls to accomplish complex tasks. Pioneered the concept of recursive self-improving AI agents.

general-purpose

CanIRun.ai

CanIRun.ai

CanIRun.ai is a browser-based tool that uses the WebGPU API to automatically detect a user's hardware specifications and determine which open-weight AI models they can run locally. It grades compatibility across popular models like Llama, Mistral, Phi, and others, helping users make informed decisions about local AI inference without manual benchmarking.

llama.cpp

Georgi Gerganov

llama.cpp is an open-source LLM inference engine written in pure C/C++ that enables running large language models locally on consumer hardware with minimal dependencies. It pioneered the GGUF quantization format, allowing models to be compressed to 2–8 bits and run efficiently on CPU and GPU. Originally built to run Meta's LLaMA models, it now supports a broad range of architectures including Mistral, Falcon, Phi, Gemma, and many others.

Diggnation

Digg

Diggnation is the official Digg podcast, originally launched in 2005 by Kevin Rose and Alex Albrecht as a weekly video podcast discussing top stories from Digg.com. The show ran until December 2011 and is being revived as a monthly podcast during Digg's platform reimagining phase, keeping the community informed about the platform's AI-era evolution.

general-purpose

Devin Review

Cognition AI

Devin Review is an AI-powered code review feature from Cognition AI that deploys the Devin AI software engineer to autonomously review pull requests, identify bugs, flag security issues, and provide actionable inline suggestions. It integrates natively with GitHub workflows, allowing engineering teams to receive thorough, context-aware code reviews without blocking on human reviewers. It is part of the broader Devin platform, which positions itself as a fully autonomous AI software engineer.

OpenClaw

OpenClaw

OpenClaw is an open-source AI agent framework featuring a dual-loop architecture, recursive tool calling, context compaction, and 14+ multi-channel chat integrations. Launched in early 2026, it gained significant traction particularly in China as an autonomous agent capable of taking over a device to complete tasks without user intervention. It supports multiple LLM providers, ships with the ClawHub skill marketplace, and has spawned a broad ecosystem including managed wrappers (Klaus), embedded ports (PycoClaw), and first-class integrations with third-party platforms.

general-purpose

LM Studio

LM Studio

LM Studio is a cross-platform desktop application that enables users to discover, download, and run large language models entirely on their local machine — no internet connection or cloud dependency required. It provides a polished GUI for chatting with models, a built-in model browser sourcing from Hugging Face, and a local OpenAI-compatible API server so existing tools and apps can connect to locally-hosted LLMs.

Model Context Protocol (MCP)

Anthropic

Model Context Protocol (MCP) is an open standard for connecting AI applications and agents to external tools, data sources, and workflows via a client-server architecture. Using JSON-RPC 2.0 over stdio or Streamable HTTP transports, it defines primitives for Tools, Resources, and Prompts that any MCP-compatible host (Claude, ChatGPT, Cursor, VS Code Copilot, and more) can discover and invoke. Originally created by Anthropic and launched in November 2024, it was donated to the Linux Foundation's Agentic AI Foundation (AAIF) in December 2025 and has grown to 500+ public servers and 97M+ monthly SDK downloads.

ChatGPT

OpenAI

ChatGPT is OpenAI's flagship conversational AI assistant, layered on top of their GPT and o-series models to deliver capabilities far beyond raw model access — including web browsing, code execution (Advanced Data Analysis), image generation via DALL-E, file and document analysis, memory, persistent Projects, and a GPT Store for specialised agents. Launched in November 2022, it triggered a mainstream AI adoption wave and remains the most widely used AI assistant globally. It supports agentic workflows through tool use, Operator-style task execution, and integrations, making it a true agent platform rather than just a model interface.

general-purpose

LangChain

LangChain AI

LangChain is a popular open-source framework for building LLM-powered applications and agents, providing abstractions for chaining model calls, managing memory, integrating tools, and orchestrating multi-step workflows. It supports both Python and JavaScript and offers a large ecosystem of integrations with models, vector stores, and external APIs. While once dominant in the agent framework space, it has faced criticism for heavy abstraction overhead, cost inflation, and design choices optimized for weaker 2022-era models that limit flexibility with more capable modern LLMs.

CrewAI

CrewAI Inc

CrewAI is an open-source Python framework for building and orchestrating multi-agent AI systems, modeling agents as role-playing "crew members" each with defined roles, goals, backstories, and tools. It supports sequential and hierarchical task flows, an internal event bus, and async execution, enabling complex collaborative agent pipelines. In December 2024, CrewAI removed its LangChain dependency entirely, repositioning itself as a lean, from-scratch multi-agent framework with a commercial enterprise platform layered on top.

GitHub Copilot

GitHub

GitHub Copilot is Microsoft/GitHub's AI-powered coding assistant that integrates directly into IDEs to provide inline code completions, a conversational chat interface, CLI assistance, and autonomous background agent capabilities. Originally launched as a tab-complete tool built on OpenAI Codex, it has evolved into a multi-model agentic platform supporting automated code edits, pull request reviews, and per-session git worktree isolation for background tasks. It is widely regarded as the category-defining AI coding assistant and a primary driver of mainstream developer adoption of LLM-assisted programming.

AutoGen

Microsoft

AutoGen is Microsoft Research's open-source multi-agent conversation framework that enables building complex agentic workflows through coordinated multi-agent conversations, tool-calling, and code execution. It supports orchestration of multiple LLM-powered agents (AssistantAgent, UserProxyAgent, GroupChat) that can collaborate autonomously to complete tasks. The project underwent a significant architectural rebrand to AG2 in late 2024, which caused fragmentation concerns and community criticism around production readiness.

Codex

OpenAI

OpenAI Codex is a cloud-based software engineering agent capable of autonomously completing coding tasks such as writing features, fixing bugs, running tests, and navigating codebases — operating in isolated sandboxes in parallel. It is available as both a cloud agent integrated into ChatGPT and an open-source CLI tool (Codex CLI), competing directly with Claude Code, Devin, and similar agentic coding products. The agent is powered by codex-1, a model fine-tuned for agentic software engineering tasks.

Codex CLI

OpenAI

Codex CLI is OpenAI's open-source command-line coding agent that runs directly in the terminal, capable of reading and writing files, executing shell commands, and performing multi-step coding tasks autonomously. It supports multiple approval modes (suggest, auto-edit, full-auto) and acts as an MCP client, integrating with a growing ecosystem of tools and servers. Originally launched in April 2025, it received a major model upgrade to GPT-5.2-Codex in December 2025 and expanded to a desktop app in February 2026.

Windsurf

Cognition AI

Windsurf is an agentic AI-first IDE originally built by Codeium (formerly Exafunction), featuring the Cascade agentic AI system that can autonomously plan, edit, and execute multi-file coding tasks with deep codebase awareness. It ships a proprietary coding model (SWE-1.5) optimized for software engineering tasks and competes directly with Cursor as one of the leading AI code editors. Codeium/Windsurf was acquired by Cognition AI (makers of Devin) for approximately $250M in December 2025.

Aider

Aider AI

Aider is an open-source, terminal-native AI pair programming tool created by Paul Gauthier that lets developers edit code across multiple files using natural language prompts. It integrates tightly with git, automatically committing changes with sensible messages, and supports over 100 LLMs including GPT-4, Claude, Gemini, and local models via OpenAI-compatible APIs. It composes naturally with editors like Vim and Emacs and is widely regarded as one of the most capable local coding agents available.

LangSmith

LangChain

LangSmith is LangChain's full-stack LLMOps observability and evaluation platform, purpose-built for debugging, monitoring, and testing LLM applications and agents. It provides end-to-end tracing of every LLM call, tool invocation, and chain step alongside automated and human-in-the-loop evaluation workflows. While deeply integrated with LangChain and LangGraph, it supports any LLM provider via SDKs for Python, TypeScript, Go, and Java, plus OpenTelemetry compatibility.

LangGraph

LangChain, Inc.

LangGraph is an open-source framework by LangChain for building stateful, multi-actor LLM applications modeled as directed graphs, where nodes represent actions or LLM calls and edges represent transitions. Unlike DAG-based pipelines, it supports cyclic execution enabling loops, retries, and conditional branching — the core primitives needed for reliable agentic systems. It ships with built-in state persistence, human-in-the-loop checkpointing, multi-agent coordination primitives, and deep LangSmith observability integration, and is backed by a managed deployment platform (LangGraph Platform).

Gemini CLI

Google

Gemini CLI is Google's open-source, terminal-based agentic coding assistant powered by Gemini 2.5 Pro. It runs directly in the command line and can autonomously read, write, and modify files, execute shell commands, search the web, and reason over entire codebases. Positioned as a direct competitor to Claude Code and Codex CLI, it offers a generous free tier via personal Google accounts.

Microsoft Copilot

Microsoft

Microsoft Copilot is an AI-powered assistant deeply integrated across Microsoft's product suite — including Windows, Microsoft 365 (Word, Excel, PowerPoint, Outlook, Teams), Edge, and Bing. It combines large language model capabilities with real-time web search, document understanding via Microsoft Graph, and image generation, enabling users to draft content, summarise emails, query data, and automate workflows within familiar Microsoft tools. Enterprise editions extend this to business data, SharePoint, and health record integrations, while Copilot Studio allows organisations to build custom agents on top of the platform.

general-purpose

Cursor Cloud Agents

Anysphere

Cursor Cloud Agents (also known as Background Agents) are fully autonomous coding agents from Anysphere that spin up isolated Ubuntu VMs to execute software engineering tasks asynchronously. They can be triggered directly from Slack, GitHub, Linear, or the Cursor IDE, allowing developers to delegate tasks without being at their machine. The agent handles the full dev loop — writing code, running tests, and opening pull requests — then hands back control when complete.

Opencode

SST

Opencode is an open-source, terminal/CLI-based AI coding agent built by the SST team (creators of the Serverless Stack framework). It provides a fully agentic coding loop — reading, editing, and creating files, executing shell commands, and iterating autonomously — with support for MCP (Model Context Protocol) to extend capabilities via custom tool servers. It is model-agnostic, allowing developers to bring their own API keys and choose from multiple LLM providers.

Playwright MCP

Microsoft

Playwright MCP is an open-source Model Context Protocol (MCP) server built by Microsoft that exposes Playwright browser automation capabilities to LLM agents. Rather than passing raw HTML or screenshots, it surfaces structured accessibility-tree snapshots of web pages, giving agents a token-efficient but still semantically rich view of the browser state. It is widely used with Claude Code, Cursor, and other MCP-compatible clients to enable agentic web navigation, form filling, testing, and acceptance-criteria verification workflows.

nono

nono

nono is a credential isolation layer for sandboxed AI agents that injects phantom, session-scoped, time-limited dummy tokens in place of real API keys and credentials. This ensures that sandboxed agents never have access to actual secrets, reducing the blast radius of a compromised or misbehaving agent. It acts as a transparent credential proxy sitting between the agent runtime and real upstream services.

Braintrust

Braintrust Data

Braintrust is a SaaS-first LLM evaluation and observability platform that enables engineering teams to systematically evaluate, trace, and improve AI applications across the full development lifecycle. It combines dataset-driven evaluation, distributed tracing via its proprietary Brainstore database, human review workflows, and CI/CD-native GitHub Actions integration to enable regression detection and quality improvement for production LLM systems. The platform is TypeScript-first but offers broad multi-language SDK coverage, with enterprise self-hosting available.

Langfuse

Langfuse

Langfuse is an open-source (MIT) LLM engineering and observability platform that provides tracing, prompt management, evaluation, and analytics for LLM applications. It is framework-agnostic and built on OpenTelemetry, deployable via self-hosted Docker/Postgres or as a cloud SaaS. Acquired by ClickHouse in January 2026, it has grown to 19,000+ GitHub stars, 2,000+ paying customers, and 26M+ SDK installs per month.

Claude Desktop

Anthropic

Claude Desktop is Anthropic's official native desktop application for macOS and Windows that brings Claude AI beyond the browser with deeper OS-level integrations. It acts as an MCP (Model Context Protocol) client, allowing users to connect Claude to local tools, files, APIs, and third-party services via configurable MCP servers. It also supports computer use (in beta), enabling Claude to autonomously interact with the desktop environment on the user's behalf.

general-purpose

Grok

xAI

Grok is xAI's AI assistant product and chatbot, built on top of its own Grok family of large language models. Available via grok.com and integrated into the X (formerly Twitter) platform, it provides real-time web and X-data search, image generation, voice interaction, and long-context reasoning. It has been deployed in enterprise and government contexts, including the U.S. military's GenAI.mil platform.

general-purpose

Helicone

Helicone

Helicone is an open-source LLM observability platform and AI gateway/reverse proxy that intercepts API requests to LLM providers — capturing logs, costs, and latency with under 1ms overhead in self-hosted mode. It integrates via a simple base URL swap (zero code changes), supports 12+ providers including OpenAI and Anthropic, and adds capabilities such as prompt caching, rate limiting, user tracking, and fine-tuning dataset creation. The gateway layer is written in Rust for minimal footprint (~64MB memory) and is available as a cloud-hosted SaaS or fully self-hostable.

Perplexity

Perplexity AI

Perplexity is an AI-powered answer engine that combines real-time web search with large language models to deliver cited, conversational responses to queries. It goes beyond raw model capabilities by autonomously fetching, synthesising, and attributing live web sources, supporting multi-turn follow-up questions, and offering a Deep Research mode that autonomously plans and executes multi-step research tasks. Available on web, iOS, Android, and as a desktop app, it targets both casual users and professionals who need fast, trustworthy research.

Continue.dev

Continue

Continue.dev is an open-source, model-agnostic AI coding assistant extension for VS Code and JetBrains IDEs that lets developers plug in their own model API keys (OpenAI, Anthropic, Gemini, or local models via Ollama). It provides AI-powered autocomplete, chat, and codebase-aware code generation through configurable context providers that can index local codebases, documentation, and repositories. As a self-hostable, bring-your-own-model tool, it competes with GitHub Copilot while offering significantly more flexibility and no vendor lock-in.

AlphaEvolve

Google DeepMind

AlphaEvolve is a Gemini-powered evolutionary coding agent developed by Google DeepMind that uses LLMs to mutate, recombine, and evolve algorithm code, with each candidate evaluated against automated verifiers in a closed loop. It functions as a meta-algorithm — autonomously generating and refining search algorithms for combinatorial optimization problems at scales beyond human intuition. Since its unveiling in May 2025, it has produced state-of-the-art breakthroughs in matrix multiplication, kissing number bounds, Ramsey theory, data center scheduling, and TPU circuit design.

VS Code

Microsoft

Visual Studio Code is Microsoft's free, open-source code editor that has evolved into a first-class AI agent platform through GitHub Copilot integration and its Agent Mode, which enables autonomous multi-step coding tasks with tool use, file edits, and terminal execution. As the world's most popular IDE with a massive extension marketplace, it serves as the primary delivery surface for AI coding agents from virtually every major AI provider. Its open extension API has made it the default target platform for AI coding assistants and LLM-powered developer tools.

Warden

Sentry

Warden is Sentry's AI-powered code review agent that reviews code locally or automatically on every pull request. It implements the agentskills.io specification, making it one of the most publicly documented adopters of the Agent Skills standard outside of Anthropic. Built on top of LLM capabilities, it surfaces issues, suggests improvements, and enforces best practices as part of the development workflow.

Ollama

Ollama

Ollama is an open-source tool that lets developers and users run large language models locally on their own hardware with a single command. It provides a simple CLI, a REST API compatible with the OpenAI API format, and a curated model library covering hundreds of models including Llama, Mistral, Gemma, Qwen, and more. Ollama supports GPU acceleration via NVIDIA CUDA, AMD ROCm, and Apple Metal, making local inference fast and accessible across macOS, Linux, and Windows.

Claude Code

Anthropic

Claude Code is Anthropic's agentic AI coding tool that operates directly in the terminal and integrates with IDEs, enabling developers to delegate complex coding tasks to an AI agent. It can autonomously write, edit, and refactor code across multiple files, run tests, execute shell commands, manage git operations, and navigate entire codebases. With support for up to 1M token context on Max, Team, and Enterprise plans (on Opus 4.6), it is purpose-built for large-scale software engineering workflows.

Cursor

Anysphere

Cursor is an AI-first code editor built as a fork of VS Code by Anysphere, designed to supercharge developer productivity with deep AI integration across the entire coding workflow. It features multi-file agentic editing, intelligent autocomplete, codebase-aware chat, and as of Cursor 2.0, supports up to 8 parallel agents running in isolated git worktrees. With over 1 million users and 360,000+ paying customers as of 2026, it has become one of the dominant AI coding environments and a major competitive threat to traditional editors.

Gemini

Google

Gemini is Google's flagship AI assistant and chatbot platform, available at gemini.google.com and integrated across Google Workspace, Android, and other products. Built on top of Google's Gemini model family, the app provides agentic capabilities including web search, code execution, image generation, document analysis, and deep integration with Gmail, Docs, Maps, and other Google services. It competes directly with ChatGPT and Claude as a general-purpose AI assistant for consumers and enterprise users.

general-purpose