hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lior Schachter <li...@infolinks.com>
Subject Re: hdsf block size
Date Thu, 17 Mar 2011 09:21:45 GMT
we have altogether 15G data to process every day (multiple M/R jobs running
on the same set of data).
currently we split this data to 60 files (but we can also split them to 120
files).

we have 15 machines with quad core.

Thanks,
Lior

On Thu, Mar 17, 2011 at 11:01 AM, Harsh J <qwertymaniac@gmail.com> wrote:

> 15 G single Gzip files? Consider block sizes in 0.5 GB+. But it also
> depends on the processing slot-power you have. Higher blocks would
> lead to higher usage of processing capacity, although with higher load
> to the NameNode in maintaining lots of blocks (and replicas per) per
> file.
>
> On Thu, Mar 17, 2011 at 2:27 PM, Lior Schachter <liors@infolinks.com>
> wrote:
> > Hi,
> > We plan a 100T cluster with M/R jobs running on 15G gzip files.
> > Should we configure HDFS block to be 128M or 256M.
> >
> > Thanks,
> > Lior
> >
>
>
>
> --
> Harsh J
> http://harshj.com
>

Mime
View raw message