hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ke yuan <ke.yuan....@gmail.com>
Subject Re: reducing mappers for a job
Date Thu, 17 Nov 2011 07:29:19 GMT
yes ,you're right,but
1)waste of disk space ,this is not right,this will not waster the disk
space of datanode,if you don't believe ,you can see the code
2) difficulty to balance HDFS,this may be true
3) low Map stage data locality; why?

2011/11/17 He Chen <airbots@gmail.com>

> Hi Jay Vyas
>
> Ke yuan's method may decrease the number of mapper because in default
>
> the number of mapper for a job = the number of blocks in this job's input
> file.
>
> Make sure you only change the block size for your specific job's input
> file. Not Hadoop cluster's configuration.
>
> If you change the block size for your Hadoop cluster configureation (in the
> hdfs-site.xml file), this method may bring some side-effects.
>
> 1) waste of disk space;
> 2) difficulty to balance HDFS;
> 3) low Map stage data locality;
>
> Bests!
>
> Chen
>
> On Wed, Nov 16, 2011 at 9:42 PM, ke yuan <ke.yuan.whu@gmail.com> wrote:
>
> > just the blocksize 128M or 256M,it may reduce the number of mappers per
> job
> >
> > 2011/11/17 Jay Vyas <jayunit100@gmail.com>
> >
> > > Hi guys : In a shared cluster environment, whats the best way to reduce
> > the
> > > number of mappers per job ?  Should you do it with inputSplits ?  Or
> > simply
> > > toggle the values in the JobConf (i.e. increase the number of bytes in
> an
> > > input split) ?
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Jay Vyas
> > > MMSB/UCHC
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message