hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Million docs and word count scenario
Date Fri, 29 Mar 2013 13:05:43 GMT
Putting each document into a separate file is not likely to be a great
thing to do.

On the other hand, putting them all into one file may not be what you want

It is probably best to find a middle ground and create files each with many
documents and each a few gigabytes in size.

On Fri, Mar 29, 2013 at 1:15 PM, <pathurun@yahoo.com> wrote:

> If there r 1 million docs in an enterprse and we need to perform word
> count computation on all the docs what is the first step to be done.  Is it
> to extract all the text of all the docs  into a single file and then put
> into hdfs or put each one separately in hdfs.
> Thanks
> Sent from BlackBerry® on Airtel

View raw message