hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sujit Dhamale <sujitdhamal...@gmail.com>
Subject Re: I need some raw big data
Date Sat, 08 Dec 2012 05:08:43 GMT
Hi,
you can use National Climatic Data Center (NCDC)  data which is good
candidate for Hadoop
Below are steps to download Data.


1. Create one Folder in your Local drive
  i created as "*/home/sujit/Desktop/Data/*"

2. Create below script and run

for i in {1901..2012}
do
cd */home/sujit/Desktop/Data/*
wget -r --no-parent --reject "index.html*"  http://ftp3.ncdc
.noaa.gov/pub/data/noaa/$i/
done

Kind Regards
Sujit Dhamale
(+91 9970086652)

On Sat, Dec 8, 2012 at 4:05 AM, Mohammad Tariq <dontariq@gmail.com> wrote:

> Hello Yin,
>
>        You may find this interesting :
> https://github.com/unitedstates
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Sat, Dec 8, 2012 at 3:25 AM, Chris Nauroth <cnauroth@hortonworks.com>wrote:
>
>> Another suggestion is Google Books Ngrams:
>>
>> http://storage.googleapis.com/books/ngrams/books/datasetsv2.html
>>
>>
>> On Fri, Dec 7, 2012 at 7:57 AM, Phillip Rhodes <motley.crue.fan@gmail.com
>> > wrote:
>>
>>> On Fri, Dec 7, 2012 at 10:48 AM, Harsh J <harsh@cloudera.com> wrote:
>>> >
>>> > On Fri, Dec 7, 2012 at 8:31 PM, Yin Steve <steveyin92@gmail.com>
>>> wrote:
>>> >>  Hello, I'm Steve who need some raw big data for studying mapreduce
>>> >> programming. Where can i find them? especially those about weblog,
>>> traffic
>>> >> info etc. My English is not so well, if you can give me a URL which
>>> directly
>>> >> help me download the big file, That'll be great.
>>> >> Waiting for your reply......
>>>
>>> Try some of the links off of this Quora thread:
>>>
>>>
>>> http://www.quora.com/Data/Where-can-I-find-large-datasets-for-modeling-confidence-during-the-financial-crisis-which-is-open-to-the-public
>>>
>>> You might also try googling "Enron corpus".   Or check out
>>> CommonCrawl.org.
>>>
>>>
>>> Phil
>>>
>>
>>
>

Mime
View raw message