Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 52271 invoked from network); 6 Oct 2010 03:30:32 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Oct 2010 03:30:32 -0000 Received: (qmail 20331 invoked by uid 500); 6 Oct 2010 03:30:31 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 20008 invoked by uid 500); 6 Oct 2010 03:30:28 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 20000 invoked by uid 99); 6 Oct 2010 03:30:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Oct 2010 03:30:28 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates 209.85.214.169 as permitted sender) Received: from [209.85.214.169] (HELO mail-iw0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Oct 2010 03:30:21 +0000 Received: by iwn41 with SMTP id 41so1766166iwn.14 for ; Tue, 05 Oct 2010 20:30:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=XJasX15NZ0YfBC+c68G9TvQ995xid9gm0r+sq2U7vtk=; b=OL0CK33Lxgosbhtol325IU+LwwJDsKpYifDbhlFvyuo5AHwIZ8i78+QTxs18bs1ldb gLFfRm1TYZx1kcoLXqNtM1Wnum9JCBqPUz6FTD6+k2rDsJWVpxHBbyftTA2Nbi4ndwAk /XQhIhwmM+eR6eQyyUIYe5cDRuSzN1494DRiw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=LmQLHlhThlpbN0G/CjXZsn8CmPs4q1sAuogctHRZhOGu6jzu87XDMEGd8y5pZU43XG OKKAV0koVLBU2VNun/UM6dituypEy5nV0q9W/DAtP9715BpTlNyGKdKikeSHrCRi/Dij gRvmVK8QHcUcnBjSgbrjk0Touyayx9MEBUWME= MIME-Version: 1.0 Received: by 10.231.30.193 with SMTP id v1mr13192642ibc.87.1286335800956; Tue, 05 Oct 2010 20:30:00 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.231.19.137 with HTTP; Tue, 5 Oct 2010 20:30:00 -0700 (PDT) In-Reply-To: <8CD331CF0689D46-119C-5CAE@webmail-m074.sysops.aol.com> References: <8CD331300AFD768-119C-4B1A@webmail-m074.sysops.aol.com> <8CD331A86B52B27-119C-59AF@webmail-m074.sysops.aol.com> <8CD331CF0689D46-119C-5CAE@webmail-m074.sysops.aol.com> Date: Tue, 5 Oct 2010 20:30:00 -0700 X-Google-Sender-Auth: yPKGFUlLiZVOl6uq5gxzDipmq2c Message-ID: Subject: Re: HBase map reduce job timing From: Jean-Daniel Cryans To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Ah ok, then using the write buffer should get you the speed you need (providing that you have the hardware capacity and that you use HTable in a efficient way). In setup() set this to false on all 3 htables: http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTa= ble.html#setAutoFlush(boolean) In cleanup() call this on all htables: http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTa= ble.html#flushCommits() Also to make your maps faster you could set this to 10 or more when you create your input format: http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/Sca= n.html#setCaching(int) J-D On Tue, Oct 5, 2010 at 8:23 PM, Venkatesh wrote: > > =A0Sure..Both input & output are HBase tables > Input (mapper phase) - scanning a HBase table for all records within time= range (using hbase timestamps) > Output (reduce phase) - doing a Put to 3 different HBase tables > > > > -----Original Message----- > From: Jean-Daniel Cryans > To: user@hbase.apache.org > Sent: Tue, Oct 5, 2010 11:14 pm > Subject: Re: HBase map reduce job timing > > > It'd be more useful if we knew where that data is coming from, and > where it's going. Are you scanning HBase and/or writing to it? > > J-D > > On Tue, Oct 5, 2010 at 8:05 PM, Venkatesh wrote: >> >> >> >> =A0Sorry..yeah..i've to do some digging to provide some data.. >> What sort of data would be helpful? Would stats reported by jobtracker.j= sp > suffice? I've pasted that in this email.. >> I can gather more jvm stats..thanks >> >> Status: Succeeded >> Started at: Tue Oct 05 21:39:58 EDT 2010 >> Finished at: Tue Oct 05 22:36:43 EDT 2010 >> Finished in: 56mins, 45sec >> Job Cleanup: Successful >> >> >> >> Kind >> % Complete >> Num Tasks >> Pending >> Running >> Complete >> Killed >> Failed/Killed >> Task Attempts >> >> map >> 100.00% >> >> >> >> >> >> 565 >> 0 >> 0 >> 565 >> 0 >> 0 / 11 >> >> reduce >> 100.00% >> >> >> >> >> >> 20 >> 0 >> 0 >> 20 >> 0 >> 0 / 2 >> >> >> >> >> >> >> >> Counter >> >> Map >> >> Reduce >> >> Total >> >> >> >> Job Counters >> >> Launched reduce tasks >> >> 0 >> >> 0 >> >> 22 >> >> >> >> Rack-local map tasks >> >> 0 >> >> 0 >> >> 66 >> >> >> >> Launched map tasks >> >> 0 >> >> 0 >> >> 576 >> >> >> >> Data-local map tasks >> >> 0 >> >> 0 >> >> 510 >> >> >> >> com.JobRecords >> >> REDUCE_PHASE_RECORDS >> >> 0 >> >> 597,712 >> >> 597,712 >> >> >> >> MAP_PHASE_RECORDS >> >> 2,534,807 >> >> 0 >> >> 2,534,807 >> >> >> >> FileSystemCounters >> >> FILE_BYTES_READ >> >> 335,845,726 >> >> 861,146,518 >> >> 1,196,992,244 >> >> >> >> FILE_BYTES_WRITTEN >> >> 1,197,031,156 >> >> 861,146,518 >> >> 2,058,177,674 >> >> >> >> Map-Reduce Framework >> >> Reduce input groups >> >> 0 >> >> 597,712 >> >> 597,712 >> >> >> >> Combine output records >> >> 0 >> >> 0 >> >> 0 >> >> >> >> Map input records >> >> 2,534,807 >> >> 0 >> >> 2,534,807 >> >> >> >> Reduce shuffle bytes >> >> 0 >> >> 789,145,342 >> >> 789,145,342 >> >> >> >> Reduce output records >> >> 0 >> >> 0 >> >> 0 >> >> >> >> Spilled Records >> >> 3,522,428 >> >> 2,534,807 >> >> 6,057,235 >> >> >> >> Map output bytes >> >> 851,007,170 >> >> 0 >> >> 851,007,170 >> >> >> >> Map output records >> >> 2,534,807 >> >> 0 >> >> 2,534,807 >> >> >> >> Combine input records >> >> 0 >> >> 0 >> >> 0 >> >> >> >> Reduce input records >> >> 0 >> >> 2,534,807 >> >> 2,534,807 >> >> >> >> >> >> >> >> >> -----Original Message----- >> From: Jean-Daniel Cryans >> To: user@hbase.apache.org >> Sent: Tue, Oct 5, 2010 10:53 pm >> Subject: Re: HBase map reduce job timing >> >> >> I'd love to give you tips, but you didn't provide any data about the >> input and output of your job, the kind of hardware you're using, etc. >> At this point any suggestion would be a stab in the dark, the best I >> can do is pointing to the existing documentation >> http://wiki.apache.org/hadoop/PerformanceTuning >> >> J-D >> >> On Tue, Oct 5, 2010 at 7:12 PM, Venkatesh wrote: >>> >>> >>> >>> =A0I've a mapreduce job that is taking too long..over an hour..Trying t= o see >> what can a tune >>> to to bring it down..One thing I noticed, the job is kicking off >>> - 500+ map tasks : 490 of them do not process any records..where as 10 = of > them >> process all the records >>> =A0(200 K each..)..Any idea why that would be?... >>> >>> ..map phase takes about couple of minutes.. >>> ..reduce phase takes the rest.. >>> >>> ..i'll try increasing # of reduce tasks..Open to other other suggestion= for >> tunables.. >>> >>> thanks for your input >>> venkatesh >>> >>> >>> >> >> >> > > >