hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ling Kun <lkun.e...@gmail.com>
Subject Million docs and word count scenario
Date Fri, 29 Mar 2013 13:52:47 GMT
Maybe har is a choice.
http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html


Ling kun

On Friday, March 29, 2013, Ted Dunning wrote:

> Putting each document into a separate file is not likely to be a great
> thing to do.
>
> On the other hand, putting them all into one file may not be what you want
> either.
>
> It is probably best to find a middle ground and create files each with
> many documents and each a few gigabytes in size.
>
>
> On Fri, Mar 29, 2013 at 1:15 PM, <pathurun@yahoo.com> wrote:
>
>> If there r 1 million docs in an enterprse and we need to perform word
>> count computation on all the docs what is the first step to be done.  Is it
>> to extract all the text of all the docs  into a single file and then put
>> into hdfs or put each one separately in hdfs.
>> Thanks
>>
>> Sent from BlackBerry® on Airtel
>
>
>

-- 
http://www.lingcc.com

Mime
View raw message