hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen He <airb...@gmail.com>
Subject Re: what does "keep 10% map, 40% reduce" mean in gridmix2's README?
Date Thu, 14 Jun 2012 17:09:18 GMT
Let me know when you get the correct answer.

Chen

On Thu, Jun 14, 2012 at 11:42 AM, Nan Zhu <zhunansjtu@gmail.com> wrote:

> Hi, Chen,
>
> Thank you for your reply,
>
> but in its README, there is no value which is larger than 100%, it means
> that the size of intermediate results will never be larger than input size,
>
> it will not be the case, because the input data is compressed, the size of
> the generated data will expand to be very large....
>
> it's just my guessing, can anyone correct me?
>
> Best,
>
> Nan
>
>
> On Thu, Jun 14, 2012 at 11:50 PM, Chen He <airbots@gmail.com> wrote:
>
> > Hi Nan
> >
> > probably the map stage will output 10% of the total input, and the reduce
> > stage will output 40% of intermediate results (10% of total input).
> >
> > For example, 500GB input, after the map stage, it will be 50GB and it
> will
> > become 20GB after the reduce stage.
> >
> > It may be similar to the loadgen in hadoop test example.
> >
> > Anyone has suggestion?
> >
> > Chen
> > System Architect Intern @ ZData
> > PhD student@CSE Dept.
> >
> >
> > On Thu, Jun 14, 2012 at 1:58 AM, Nan Zhu <zhunansjtu@gmail.com> wrote:
> >
> > > Hi, all
> > >
> > > I'm using gridmix2 to test my cluster, while in its README file, there
> > are
> > > statements like the following:
> > >
> > > +1) Three stage map/reduce job
> > > +          Input:      500GB compressed (2TB uncompressed) SequenceFile
> > > +                 (k,v) = (5 words, 100 words)
> > > +                 hadoop-env: FIXCOMPSEQ
> > > +     *Compute1:   keep 10% map, 40% reduce
> > > +          Compute2:   keep 100% map, 77% reduce
> > > +                 Input from Compute1
> > > +     Compute3:   keep 116% map, 91% reduce
> > > +                 Input from Compute2
> > > +     *Motivation: Many user workloads are implemented as pipelined
> > > map/reduce
> > > +                 jobs, including Pig workloads
> > >
> > >
> > > Can anyone tell me what does "keep 10% map, 40% reduce" mean here?
> > >
> > > Best,
> > >
> > > --
> > > Nan Zhu
> > > School of Electronic, Information and Electrical Engineering,229
> > > Shanghai Jiao Tong University
> > > 800,Dongchuan Road,Shanghai,China
> > > E-Mail: zhunansjtu@gmail.com
> > >
> >
>
>
>
> --
> Nan Zhu
> School of Electronic, Information and Electrical Engineering,229
> Shanghai Jiao Tong University
> 800,Dongchuan Road,Shanghai,China
> E-Mail: zhunansjtu@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message