Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of gemini5201314@gmail.com
 designates 209.85.214.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAGparvVpmtJxataf1YT6cUCk=Jipcr0e==D4__UMiaS4o3Z+mg@mail.gmail.com>
References: 
 <CAJPAVwVxV-4GLu3dEt1hGM5CGqTVJ5e079-GreaY51w=Z9qSwQ@mail.gmail.com>
	<CAGparvWKeM+B6nkAeUOUXg3DEVYwKcxhyTYka5nooFP_vZ4MtQ@mail.gmail.com>
	<CAJPAVwWwFJFqmrDry_iCrvWUM5kxzB4_VRLyaYRrWNDvBvzp3A@mail.gmail.com>
	<CAGparvVpmtJxataf1YT6cUCk=Jipcr0e==D4__UMiaS4o3Z+mg@mail.gmail.com>
Date: Mon, 25 Jun 2012 14:50:25 +0800
Message-ID: 
 <CADo2yOCsJvK3ZvzKWi3q426Eewwg=o7PCpzCBM4z0vuE2NscEQ@mail.gmail.com>
Subject: Re: what does "keep 10% map, 40% reduce" mean in gridmix2's README?
From: gemini alex <gemini5201314@gmail.com>
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=e89a8fb20752888bcf04c34665a1

--e89a8fb20752888bcf04c34665a1
Content-Type: text/plain; charset=UTF-8

did you configure map output compression ?


2012/6/15 Chen He <airbots@gmail.com>

> Let me know when you get the correct answer.
>
> Chen
>
> On Thu, Jun 14, 2012 at 11:42 AM, Nan Zhu <zhunansjtu@gmail.com> wrote:
>
> > Hi, Chen,
> >
> > Thank you for your reply,
> >
> > but in its README, there is no value which is larger than 100%, it means
> > that the size of intermediate results will never be larger than input
> size,
> >
> > it will not be the case, because the input data is compressed, the size
> of
> > the generated data will expand to be very large....
> >
> > it's just my guessing, can anyone correct me?
> >
> > Best,
> >
> > Nan
> >
> >
> > On Thu, Jun 14, 2012 at 11:50 PM, Chen He <airbots@gmail.com> wrote:
> >
> > > Hi Nan
> > >
> > > probably the map stage will output 10% of the total input, and the
> reduce
> > > stage will output 40% of intermediate results (10% of total input).
> > >
> > > For example, 500GB input, after the map stage, it will be 50GB and it
> > will
> > > become 20GB after the reduce stage.
> > >
> > > It may be similar to the loadgen in hadoop test example.
> > >
> > > Anyone has suggestion?
> > >
> > > Chen
> > > System Architect Intern @ ZData
> > > PhD student@CSE Dept.
> > >
> > >
> > > On Thu, Jun 14, 2012 at 1:58 AM, Nan Zhu <zhunansjtu@gmail.com> wrote:
> > >
> > > > Hi, all
> > > >
> > > > I'm using gridmix2 to test my cluster, while in its README file,
> there
> > > are
> > > > statements like the following:
> > > >
> > > > +1) Three stage map/reduce job
> > > > +          Input:      500GB compressed (2TB uncompressed)
> SequenceFile
> > > > +                 (k,v) = (5 words, 100 words)
> > > > +                 hadoop-env: FIXCOMPSEQ
> > > > +     *Compute1:   keep 10% map, 40% reduce
> > > > +          Compute2:   keep 100% map, 77% reduce
> > > > +                 Input from Compute1
> > > > +     Compute3:   keep 116% map, 91% reduce
> > > > +                 Input from Compute2
> > > > +     *Motivation: Many user workloads are implemented as pipelined
> > > > map/reduce
> > > > +                 jobs, including Pig workloads
> > > >
> > > >
> > > > Can anyone tell me what does "keep 10% map, 40% reduce" mean here?
> > > >
> > > > Best,
> > > >
> > > > --
> > > > Nan Zhu
> > > > School of Electronic, Information and Electrical Engineering,229
> > > > Shanghai Jiao Tong University
> > > > 800,Dongchuan Road,Shanghai,China
> > > > E-Mail: zhunansjtu@gmail.com
> > > >
> > >
> >
> >
> >
> > --
> > Nan Zhu
> > School of Electronic, Information and Electrical Engineering,229
> > Shanghai Jiao Tong University
> > 800,Dongchuan Road,Shanghai,China
> > E-Mail: zhunansjtu@gmail.com
> >
>

--e89a8fb20752888bcf04c34665a1--