Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 73091 invoked from network); 13 Aug 2010 20:21:33 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Aug 2010 20:21:33 -0000 Received: (qmail 58111 invoked by uid 500); 13 Aug 2010 20:21:30 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 58028 invoked by uid 500); 13 Aug 2010 20:21:30 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 58020 invoked by uid 99); 13 Aug 2010 20:21:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Aug 2010 20:21:29 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of boyuzhang35@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Aug 2010 20:21:24 +0000 Received: by wwb22 with SMTP id 22so4076190wwb.29 for ; Fri, 13 Aug 2010 13:21:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=D6HAnQ++IkNuozfg2VY0Qn30V+sR88CeO4xMUFcFerw=; b=jrvEZtjUMajcYGdxNVSquO1vAOgG5bn4vP2y44LpRMxVqqfZRkDBDOYmE5aAeaNzQc G/CnQV1YB/puHbrCYpIv70e3u+dSeYQmmuGhZBVKipaXkJhmb3GNa6BlNiRkG8ZI5bts 04FOje1mOnjP6t9BISiqGi1S7XiYrswwy7RdU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=jd/X2iijz20Rd7X1DuvUeI0+IZKJ12++1Y0pbhfKRihqkUWteVZyzf+XSxctcH3XA1 DAHkx5euvyarT0Y8OnyKpMpCxhIHYhH/riGladJoKmj8ksUeNueb/8yHLlABWuYahp2/ Y7rP+IS3/KiTaRSwbOux47sLBG5uO28AY+pug= MIME-Version: 1.0 Received: by 10.216.232.229 with SMTP id n79mr1726660weq.52.1281730863233; Fri, 13 Aug 2010 13:21:03 -0700 (PDT) Received: by 10.216.137.144 with HTTP; Fri, 13 Aug 2010 13:21:03 -0700 (PDT) In-Reply-To: References: Date: Fri, 13 Aug 2010 16:21:03 -0400 Message-ID: Subject: Re: large parameter file, too many intermediate output From: Boyu Zhang To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00151750e3b8ccf9f3048dba3872 --00151750e3b8ccf9f3048dba3872 Content-Type: text/plain; charset=ISO-8859-1 Hi Harsh, Thank you for the reply. I will try that, although now the map tasks are taking too much time, almost 20 min to finish all the map tasks(~90). I don't know if compression will slow me down, but I will make a test and see. Thank you very much! Boyu On Thu, Aug 12, 2010 at 11:07 PM, Harsh J wrote: > Apart from the combiner suggestion, I'd also suggest using > intermediate map-output compression always (With LZO, if possible). > Saves you some IO. > > On Fri, Aug 13, 2010 at 3:24 AM, Boyu Zhang wrote: > > Hi Steve, > > > > Thanks for the reply! > > > > On Thu, Aug 12, 2010 at 5:47 PM, Steve Lewis > wrote: > > > >> I don't think of half a billion key value pairs as that large a number > - > >> nor 20,000 per task - these are > >> not atypical for hadoop tasks and many users will see these as small > >> numbers > >> while you might use cleverness such as a combiner to reduce the output I > >> wonder if this is needed > >> What is your cluster size and how fast does the job perform??? > >> > > > > I am using combiner to compact the output a little bit before they got > > written to the disk. My cluster is 48 cores (6 nodes * 8cores/node), my > > chunk size is 12MB, there are 90 or so map tasks, and it takes about 30 > min > > to process. It is very slow I think. Thanks for the attention and > interest! > > > > Boyu > > > > > > -- > Harsh J > www.harshj.com > --00151750e3b8ccf9f3048dba3872--