Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 674DC109C9 for ; Mon, 26 Aug 2013 06:34:19 +0000 (UTC) Received: (qmail 22840 invoked by uid 500); 26 Aug 2013 06:34:12 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 22697 invoked by uid 500); 26 Aug 2013 06:34:01 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 22686 invoked by uid 99); 26 Aug 2013 06:33:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Aug 2013 06:33:59 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pavan0591@gmail.com designates 209.85.219.52 as permitted sender) Received: from [209.85.219.52] (HELO mail-oa0-f52.google.com) (209.85.219.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Aug 2013 06:33:52 +0000 Received: by mail-oa0-f52.google.com with SMTP id f4so3132159oah.39 for ; Sun, 25 Aug 2013 23:33:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=Ho8ldQpWqacYJwcvvfsqou8QHuCKCKidXMoYD1I7sUk=; b=rhP4UMUkWhN4P5kixmidAdRw5/eRPcj99xC8+h1+7sIjwU3Uso58V4QPjlZ50gjR2X qY+0MvQiHRz65isZHAuPLCaamTt0MtrSi9t+r1lkPgHb+szJBlZ3W+HMaiNzPAmgMHry JLJkEb0yL2fqM2b1AR0DJuzz9tqHTBlh5nPKCaD6XuPfwgppCxq8nmL7YPHsyLxCBWAl +gWwBYBTk3d93F9aXySDmM8DUm4LMnd3qd7TdeRIuAVbY2daOs9bCwMN5Wd4sXUdwVP4 3IHjcQlC4gG9mJ85EMFD7MMkGXL8XdQYMyW4Ue4BUZYBI+HfLjw9lkEh8LxQ2nX8ZFQL W4yg== X-Received: by 10.60.51.7 with SMTP id g7mr13131276oeo.6.1377498811491; Sun, 25 Aug 2013 23:33:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.176.106 with HTTP; Sun, 25 Aug 2013 23:33:11 -0700 (PDT) In-Reply-To: References: From: Pavan Sudheendra Date: Mon, 26 Aug 2013 12:03:11 +0530 Message-ID: Subject: Re: Mapper and Reducer takes longer than usual for a HBase table aggregation task To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c308e0567b3904e4d3efef X-Virus-Checked: Checked by ClamAV on apache.org --001a11c308e0567b3904e4d3efef Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable Jens, can i set a smaller value in my application? Is this valid? conf.setInt("mapred.max.split.size", 50); This is our mapred-site.xml: mapred.job.tracker ip-10-10-100170.eu-east-1.compute.internal:8021 mapred.job.tracker.http.address 0.0.0.0:50030 mapreduce.job.counters.max 120 mapred.output.compress false mapred.output.compression.type BLOCK mapred.output.compression.codec org.apache.hadoop.io.compress.DefaultCodec mapred.map.output.compression.codec org.apache.hadoop.io.compress.SnappyCodec mapred.compress.map.output true zlib.compress.level DEFAULT_COMPRESSION io.sort.factor 64 io.sort.record.percent 0.05 io.sort.spill.percent 0.8 mapred.reduce.parallel.copies 10 mapred.submit.replication 2 mapred.reduce.tasks 6 mapred.userlog.retain.hours 24 io.sort.mb 112 mapred.child.java.opts -Xmx471075479 mapred.job.reuse.jvm.num.tasks 1 mapred.map.tasks.speculative.execution false mapred.reduce.tasks.speculative.execution false mapred.reduce.slowstart.completed.maps 0.8 Suggest ways to overwrite the default value please. On Mon, Aug 26, 2013 at 9:38 AM, anil gupta wrote: > Hi Pavan, > > Standalone cluster? How many RS you are running?What are you trying to > achieve in MR? Have you tried increasing scanner caching? > Slow is very theoretical unless we know some more details of your stuff. > > ~Anil > > > > On Sun, Aug 25, 2013 at 5:52 PM, =C0=EE=BA=E9=D6=D2 = wrote: > >> You need release your map code here to analyze the question. generally, >> when map/reduce hbase, scanner with filter(s) is used. so the mapper co= unt >> is the hbase region count in your hbase table. >> As the reason why you reduce so slow, I guess, you have an disaster join >> on the three tables, which cause too many rows. >> >> =D3=DA 2013/8/26 4:36, Pavan Sudheendra =D0=B4=B5=C0: >> >> Another Question, why does it indicate number of mappers as 1? Can i >>> change it so that multiple mappers perform computation? >>> >> >> > > > -- > Thanks & Regards, > Anil Gupta > --=20 Regards- Pavan --001a11c308e0567b3904e4d3efef Content-Type: text/html; charset=GB2312 Content-Transfer-Encoding: quoted-printable
Jens, can i set a smaller value in my = application?
Is this valid?
conf.setInt("mapred= .max.split.size", 50);

This is our mapred-site.xml:=
<?xml version=3D"1.0" encoding=3D"UTF-8"?>

<configuration> <property> <name>mapre= d.job.tracker</name> <value>ip-1= 0-10-100170.eu-east-1.compute.internal:8021</value> </property> <property> <name>mapre= d.job.tracker.http.address</<= span style=3D"color:rgb(95,80,53)">name> <value>0.0.0.0:50030</value> </property> <property> <name>mapre= duce.job.counters.max</name> <value>120<= span style=3D"color:rgb(166,87,0)"></value> </property> <property> <name>mapre= d.output.compress</name&g= t; <value>fals= e</value> </property> <property> <name>mapre= d.output.compression.type</name> <value>BLOC= K</value> </property> <property> <name>mapre= d.output.compression.codec</<= span style=3D"color:rgb(95,80,53)">name> <value>org.= apache.hadoop.io.compress.DefaultCodec&= lt;/value> </property> <property> <name>mapre= d.map.output.compression.codec</name> <value>org.= apache.hadoop.io.compress.SnappyCodec&l= t;/value> </property> <property> <name>mapre= d.compress.map.output</name> <value>true= </value> </property> <property> <name>zlib.= compress.level</name>= ; <value>DEFA= ULT_COMPRESSION</value&g= t; </property> <property> <name>io.so= rt.factor</name> <value>64</value> </property> <property> <name>io.so= rt.record.percent</name&g= t; <value>0.05= </value> </property> <property> <name>io.so= rt.spill.percent</name>= ; <value>0.8<= span style=3D"color:rgb(166,87,0)"></value> </property> <property> <name>mapre= d.reduce.parallel.copies</name> <value>10</value> </property> <property> <name>mapre= d.submit.replication</name> <value>2</value> </property> <property> <name>mapre= d.reduce.tasks</name>= ; <value>6</value> </property> <property> <name>mapre= d.userlog.retain.hours</name> <value>24</value> </property> <property> <name>io.so= rt.mb</name> <value>112<= span style=3D"color:rgb(166,87,0)"></value> </property> <property> <name>mapre= d.child.java.opts</name&g= t; <value> -Xm= x471075479</value> </property> <property> <name>mapre= d.job.reuse.jvm.num.tasks</name> <value>1</value> </property> <property> <name>mapre= d.map.tasks.speculative.execution</<= /span>name> <value>fals= e</value> </property> <property> <name>mapre= d.reduce.tasks.speculative.execution<= ;/name> <value>fals= e</value> </property> <property> <name>mapre= d.reduce.slowstart.completed.maps</<= /span>name> <value>0.8<= span style=3D"color:rgb(166,87,0)"></value> </property> </configuration>

Suggest ways to overwrite the default value please.<= br>


On Mon,= Aug 26, 2013 at 9:38 AM, anil gupta <anilgupta84@gmail.com> wrote:
Hi Pavan,

=
Standalone cluster? How many RS you are running?What are you trying t= o achieve in MR? Have you tried increasing scanner caching?
Slow is very theoretical unless we know some more details of your stuff.
~Anil



On Sun, Aug 25, 2013 at 5= :52 PM, =C0=EE=BA=E9=D6=D2 <lhztop@hotmail.com> wrote:
You need release your map code here to analy= ze the question. generally, when map/reduce hbase,  scanner with filte= r(s) is used. so the mapper count is the hbase region count in your hbase t= able.
As the reason why you reduce so slow, I guess, you have an disaster join on= the three tables, which cause too many rows.

=D3=DA 2013/8/26 4:36, Pavan Sudheendra =D0=B4=B5=C0:

Another Question, why does it indicate number of mappers as 1? Can i change= it so that multiple mappers perform computation?




--
Thanks & Regards,
An= il Gupta



--
Regards-
Pavan
--001a11c308e0567b3904e4d3efef--