Agents
Models
Companies
News
Compare
About

Common Crawl

Common Crawl is a non-profit organization that crawls the web and provides raw web data freely to the public. They operate CCBot, a training bot utilized by many AI laboratories and models. Since 2007, they have been maintaining open datasets that are extensively used for Large Language Model (LLM) training.

Company Info

Founded: 2007
Headquarters: Unknown
Founders
Employees: Unknown
Website: https://commoncrawl.org
Status: active

Funding

Undisclosed total

[agentwars]

Tracking the rise of AI agents

© 2026 Agent Wars