Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1BA9D9E94 for ; Mon, 25 Jun 2012 06:50:58 +0000 (UTC) Received: (qmail 98634 invoked by uid 500); 25 Jun 2012 06:50:54 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 98524 invoked by uid 500); 25 Jun 2012 06:50:53 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 98484 invoked by uid 99); 25 Jun 2012 06:50:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jun 2012 06:50:52 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLY,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gemini5201314@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jun 2012 06:50:46 +0000 Received: by obbwd18 with SMTP id wd18so7416948obb.35 for ; Sun, 24 Jun 2012 23:50:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=BH+i/vWAIReiXXT8+kZvvq7odUlehMzrUWzT4PHG5ks=; b=ASWbLbb0ZrXrQcyV1ZCbMEmV5qHiJxUGLCOeaM1Smhmr9uKsr6yRo0ysUHBIqBTTGD bA6gwX7+WyRzFJMH7QpypYjnpb/U9ZagQPHHcblzboxmpGv2tG1kaQkWNB47VYyoF1ju B23FV+8VD84rQYgkfXgHlNXfHedco8y077IWIrgWu9SzkAlaah3Xab00h449W2tVjtRq Q7l8+ViEMaj+IkfbScaQHx0fy/6b/V3QEeFjUhhf50yVf5fZb58dw2IcloDwOGQePsdi YC5902w53XfQRuUa6/s4xKWzyIU+Rk/lABT9RVRhFzqP9qAGmOisM1Prm+2ohfVO8GxL 8xfw== MIME-Version: 1.0 Received: by 10.60.2.105 with SMTP id 9mr11060417oet.65.1340607025416; Sun, 24 Jun 2012 23:50:25 -0700 (PDT) Received: by 10.60.141.40 with HTTP; Sun, 24 Jun 2012 23:50:25 -0700 (PDT) In-Reply-To: References: Date: Mon, 25 Jun 2012 14:50:25 +0800 Message-ID: Subject: Re: what does "keep 10% map, 40% reduce" mean in gridmix2's README? From: gemini alex To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8fb20752888bcf04c34665a1 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb20752888bcf04c34665a1 Content-Type: text/plain; charset=UTF-8 did you configure map output compression ? 2012/6/15 Chen He > Let me know when you get the correct answer. > > Chen > > On Thu, Jun 14, 2012 at 11:42 AM, Nan Zhu wrote: > > > Hi, Chen, > > > > Thank you for your reply, > > > > but in its README, there is no value which is larger than 100%, it means > > that the size of intermediate results will never be larger than input > size, > > > > it will not be the case, because the input data is compressed, the size > of > > the generated data will expand to be very large.... > > > > it's just my guessing, can anyone correct me? > > > > Best, > > > > Nan > > > > > > On Thu, Jun 14, 2012 at 11:50 PM, Chen He wrote: > > > > > Hi Nan > > > > > > probably the map stage will output 10% of the total input, and the > reduce > > > stage will output 40% of intermediate results (10% of total input). > > > > > > For example, 500GB input, after the map stage, it will be 50GB and it > > will > > > become 20GB after the reduce stage. > > > > > > It may be similar to the loadgen in hadoop test example. > > > > > > Anyone has suggestion? > > > > > > Chen > > > System Architect Intern @ ZData > > > PhD student@CSE Dept. > > > > > > > > > On Thu, Jun 14, 2012 at 1:58 AM, Nan Zhu wrote: > > > > > > > Hi, all > > > > > > > > I'm using gridmix2 to test my cluster, while in its README file, > there > > > are > > > > statements like the following: > > > > > > > > +1) Three stage map/reduce job > > > > + Input: 500GB compressed (2TB uncompressed) > SequenceFile > > > > + (k,v) = (5 words, 100 words) > > > > + hadoop-env: FIXCOMPSEQ > > > > + *Compute1: keep 10% map, 40% reduce > > > > + Compute2: keep 100% map, 77% reduce > > > > + Input from Compute1 > > > > + Compute3: keep 116% map, 91% reduce > > > > + Input from Compute2 > > > > + *Motivation: Many user workloads are implemented as pipelined > > > > map/reduce > > > > + jobs, including Pig workloads > > > > > > > > > > > > Can anyone tell me what does "keep 10% map, 40% reduce" mean here? > > > > > > > > Best, > > > > > > > > -- > > > > Nan Zhu > > > > School of Electronic, Information and Electrical Engineering,229 > > > > Shanghai Jiao Tong University > > > > 800,Dongchuan Road,Shanghai,China > > > > E-Mail: zhunansjtu@gmail.com > > > > > > > > > > > > > > > -- > > Nan Zhu > > School of Electronic, Information and Electrical Engineering,229 > > Shanghai Jiao Tong University > > 800,Dongchuan Road,Shanghai,China > > E-Mail: zhunansjtu@gmail.com > > > --e89a8fb20752888bcf04c34665a1--