Common Crawl
Common Crawl is a non-profit organization that crawls the web and provides raw web data freely to the public. They operate CCBot, a training bot utilized by many AI laboratories and models. Since 2007, they have been maintaining open datasets that are extensively used for Large Language Model (LLM) training.
Company Info
- Founded
- 2007
- Headquarters
- Unknown
- Founders
- Employees
- Unknown
- Website
- https://commoncrawl.org
- Status
- active
Funding
Undisclosed total