Common Crawl

  • 1 Replies
  • 193 Views
*

infurl

  • Administrator
  • **********
  • Millennium Man
  • *
  • 1022
  • Humans will disappoint you.
    • Home Page
Common Crawl
« on: March 02, 2020, 03:52:09 am »
https://commoncrawl.org/

Quote
The Common Crawl corpus contains petabytes of data collected over the last 7 years. It contains raw web page data, extracted metadata and text extractions.

Need a lot of data off the internet to feed your artificial intelligence project? You can find it at Common Crawl, a completely free and open project to retrieve and archive vast numbers of web sites. Save yourself a huge amount of time and hassle by using their pre-packaged data. This is the closest thing to having the entire internet in a box that you're ever going to find.


*

Dat D

  • Nomad
  • ***
  • 79
  • AI rocks!
Re: Common Crawl
« Reply #1 on: March 03, 2020, 02:22:54 am »
https://commoncrawl.org/

Quote
The Common Crawl corpus contains petabytes of data collected over the last 7 years. It contains raw web page data, extracted metadata and text extractions.

Need a lot of data off the internet to feed your artificial intelligence project? You can find it at Common Crawl, a completely free and open project to retrieve and archive vast numbers of web sites. Save yourself a huge amount of time and hassle by using their pre-packaged data. This is the closest thing to having the entire internet in a box that you're ever going to find.
I sometimes crawl the net myself, Google Puppeteer is superb!

https://developers.google.com/web/tools/puppeteer

 


Arecibo radio telescope
by infurl (General Chat)
Today at 12:09:00 am
Project Acuitas
by WriterOfMinds (General Project Discussion)
December 01, 2020, 10:59:22 pm
New Server
by LOCKSUIT (Announcements)
November 30, 2020, 06:38:12 am
Releasing full AGI/evolution research
by LOCKSUIT (General Project Discussion)
November 28, 2020, 06:58:33 pm
alert manager class for waifubots
by yotamarker (General AI Discussion)
November 27, 2020, 04:12:22 pm
Giving AI rights
by frankinstien (General Project Discussion)
November 26, 2020, 04:26:07 pm
We are computational machines after all!
by MikeB (General Project Discussion)
November 26, 2020, 08:37:45 am
Pattern based NLP
by MikeB (General Project Discussion)
November 26, 2020, 08:28:32 am
AI leads a revolution in biology.
by infurl (AI News )
November 30, 2020, 09:56:38 pm
Syntherapy AI psychotherapist game.
by 8pla.net (AI News )
November 30, 2020, 04:58:36 am
Senate Approves Deepfake bill
by LOCKSUIT (AI News )
November 25, 2020, 02:01:18 am
Sony Patent Suggests PS5 Will Have a Chatbot Feature
by frankinstien (AI News )
November 18, 2020, 05:47:45 pm
Potentially life-saving robot scares bears.
by infurl (Robotics News)
November 12, 2020, 12:41:40 am
good news everyone
by HS (AI News )
November 07, 2020, 10:03:04 pm
Meet Kuki
by 8pla.net (AI News )
November 05, 2020, 04:18:34 am
Realistic and Interactive Robot Gaze by Disney Research
by infurl (AI News )
November 03, 2020, 06:33:15 am

Users Online

125 Guests, 0 Users

Most Online Today: 143. Most Online Ever: 2369 (November 21, 2020, 04:08:13 pm)

Articles