Common Crawl

  • 1 Replies
  • 903 Views
*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1365
  • Humans will disappoint you.
    • Home Page
Common Crawl
« on: March 02, 2020, 03:52:09 am »
https://commoncrawl.org/

Quote
The Common Crawl corpus contains petabytes of data collected over the last 7 years. It contains raw web page data, extracted metadata and text extractions.

Need a lot of data off the internet to feed your artificial intelligence project? You can find it at Common Crawl, a completely free and open project to retrieve and archive vast numbers of web sites. Save yourself a huge amount of time and hassle by using their pre-packaged data. This is the closest thing to having the entire internet in a box that you're ever going to find.


*

Dee

  • Nomad
  • ***
  • 94
  • AI rocks!
Re: Common Crawl
« Reply #1 on: March 03, 2020, 02:22:54 am »
https://commoncrawl.org/

Quote
The Common Crawl corpus contains petabytes of data collected over the last 7 years. It contains raw web page data, extracted metadata and text extractions.

Need a lot of data off the internet to feed your artificial intelligence project? You can find it at Common Crawl, a completely free and open project to retrieve and archive vast numbers of web sites. Save yourself a huge amount of time and hassle by using their pre-packaged data. This is the closest thing to having the entire internet in a box that you're ever going to find.
I sometimes crawl the net myself, Google Puppeteer is superb!

https://developers.google.com/web/tools/puppeteer

 


OpenAI Speech-to-Speech Reasoning Demo
by ivan.moony (AI News )
Today at 01:31:53 pm
Say good-bye to GPUs...
by MikeB (AI News )
March 23, 2024, 09:23:52 am
Google Bard report
by ivan.moony (AI News )
February 14, 2024, 04:42:23 pm
Elon Musk's xAI Grok Chatbot
by MikeB (AI News )
December 11, 2023, 06:26:33 am
Nvidia Hype
by 8pla.net (AI News )
December 06, 2023, 10:04:52 pm
How will the OpenAI CEO being Fired affect ChatGPT?
by 8pla.net (AI News )
December 06, 2023, 09:54:25 pm
Independent AI sovereignties
by WriterOfMinds (AI News )
November 08, 2023, 04:51:21 am
LLaMA2 Meta's chatbot released
by 8pla.net (AI News )
October 18, 2023, 11:41:21 pm

Users Online

290 Guests, 0 Users

Most Online Today: 346. Most Online Ever: 2369 (November 21, 2020, 04:08:13 pm)

Articles