Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B3386900D for ; Thu, 25 Oct 2012 04:11:01 +0000 (UTC) Received: (qmail 24780 invoked by uid 500); 25 Oct 2012 04:10:59 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 24678 invoked by uid 500); 25 Oct 2012 04:10:59 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 24667 invoked by uid 99); 25 Oct 2012 04:10:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 04:10:59 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.138.91.60] (HELO nm22-vm0.bullet.mail.ne1.yahoo.com) (98.138.91.60) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 04:10:52 +0000 Received: from [98.138.226.179] by nm22.bullet.mail.ne1.yahoo.com with NNFMP; 25 Oct 2012 04:10:31 -0000 Received: from [98.138.226.160] by tm14.bullet.mail.ne1.yahoo.com with NNFMP; 25 Oct 2012 04:10:31 -0000 Received: from [127.0.0.1] by omp1061.mail.ne1.yahoo.com with NNFMP; 25 Oct 2012 04:10:31 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 619756.40199.bm@omp1061.mail.ne1.yahoo.com Received: (qmail 41726 invoked by uid 60001); 25 Oct 2012 04:10:31 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1351138231; bh=xvO0Dhgv/JZFok8Sp1MFELPQMOZkQpS3qKDhGkYxWwE=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=qFq2VFyKSfckvOjvYvPOGI7mQjM9341B4eweOrvCgGgMtdgFNBzxkpGDEN+7nJ2mn9KBizVb1PdUk+RlOvapbaeFUA2gHu9gr9wooJB6eqEclMdmrHX55o1V5TEzUX+xASzCLZ0ur/F2agNQUiOTHzx86P3EhxmMfa9Z+VR5R4Y= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=XgFp3nMBjVtIW9DMXtHFztrsKFNDzeBSZx9epGDwF4O2fI07ecS6YQ1M0tDD+18AcXQUaxvSqr6Qc/XpR1fenHWYDGTK7Svf2eV6PYY9sO5tz2LIY4rx/o/1emvfXL7zS2Ku56DqzmNnVKM/OvZoUvM5L1Z23OrcIemUoyo7U1E=; X-YMail-OSG: sxE23boVM1k9nW5UEe2gV8MO1CNCizosi84VtXf9bjnSe_Y yJY4NWa.7gmbqjzAJGn06RYTARK8WmUt6MH5rfe1mx2CHmilVewZ2KMvzjOM Xj6Ev1Afe41qi48iEa_seBiMuBgFeAp6lXf48yGxacnHNtQCQCmCYSo24DhL VmumON5erP7Sh5jvUDkCnVWVGKMetsdCh3CvWUbn9fA0mAbuPp2ZTOZxKhKQ zv3DlU3SDcFCgQRSIr3FnEH7NcFOoELU_v2KZJDTIE7zZmVw_s87uwhWJ24E m0u7ecGr7BQRimiBGSOtbSnF7GGB2UHydPpFC83TQEnkfnE.6OY1ZDA79iMD AAePbRR0W85cs.GHfM7BKTpmWhaMmpuIoixWrwDcK7C.ge8T5oo09Sg4Ullc 7UmLsw1_UHpOGk7.PUd4g3G8uGpFJxUDXMoSPh.auIP_RJlnMsPkKEB_vw1c ON7qSo4gAEKPQ7xU.30WmNNUBPRKzthghqiOpSOvSRD_oj0S3qtTvBzOtKJg 1oxjLGOsXm.JoT77aDUCMn55aZe6ggnMt6fRO5mjozBAG19kEGn3j34rNrxA SKSLlWGS7uWHJkHlaXmCIJN5.HS5tN92KBsc2LozkMizPS3HA33627wyCD7J QeBhgkrWM0gtnzjiICK8Sh1zDkJYbXGmhNt0Wwv6Z_cf.aA-- Received: from [107.3.190.75] by web121703.mail.ne1.yahoo.com via HTTP; Wed, 24 Oct 2012 21:10:30 PDT X-Rocket-MIMEInfo: 001.001,VGhpcyBpcyBnb29kIGFkdmljZSBLZXZpbiB3ZSBzaG91bGQgYWRkIHRoaXMgdG8gdGhlIEhCYXNlIFJlZmVyZW5jZSBHdWlkZS4KCgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KIEZyb206IEtldmluIE8nZGVsbCA8a2V2aW4ub2RlbGxAY2xvdWRlcmEuY29tPgpUbzogdXNlckBoYmFzZS5hcGFjaGUub3JnIApTZW50OiBUdWVzZGF5LCBPY3RvYmVyIDIzLCAyMDEyIDEwOjQ3IEFNClN1YmplY3Q6IFJlOiBIYmFzZSBpbXBvcnQgVHN2IHBlcmZvcm1hbmNlIChzbG93IGltcG9ydCkKIApZb3Ugd2kBMAEBAQE- X-Mailer: YahooMailWebService/0.8.123.460 References: Message-ID: <1351138230.41525.YahooMailNeo@web121703.mail.ne1.yahoo.com> Date: Wed, 24 Oct 2012 21:10:30 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Hbase import Tsv performance (slow import) To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="1001534069-418957583-1351138230=:41525" X-Virus-Checked: Checked by ClamAV on apache.org --1001534069-418957583-1351138230=:41525 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable This is good advice Kevin we should add this to the HBase Reference Guide.= =0A=0A=0A=0A________________________________=0A From: Kevin O'dell =0ATo: user@hbase.apache.org =0ASent: Tuesday, October 23= , 2012 10:47 AM=0ASubject: Re: Hbase import Tsv performance (slow import)= =0A =0AYou will want to make sure your table is pre-split.=A0 Also Import d= oes=0Aputs, so you will want to make sure you are not flushing and blocking= =0Aby raising your memstore, Hlog, and blocking count.=A0 This can greatly= =0Aimprove your write speeds.=A0 I usually do a 256MB memstore(you can=0Alo= wer it later if it is not a heavy writes table), 512MB Hlog(same=0Athing, y= ou can lower back to default), and then raise the storefile=0Ablocking coun= t to about 100.=0A=0AOn Tue, Oct 23, 2012 at 1:32 PM, Nicolas Liochon wrote:=0A> Thanks, checking the schema itself is still inter= esting (cf. the link sent)=0A> As well, with 3 machines and a replication f= actor of 3, all the machines=0A> are used during a write. As HBase writes a= ll entries into a write-ahead-log=0A> for safety, the number of writes is a= lso doubled. So may be your machine is=0A> just dying under the load. Anywa= y, here your cluster is going at the speed=0A> of the least powerful machin= e, and this machine has a workload multiplied=0A> by 6 compared to a single= machine config (i.e. just writing a file locally).=0A>=0A> On Tue, Oct 23,= 2012 at 7:13 PM, Nick maillard <=0A> nicolas.maillard@fifty-five.com> wrot= e:=0A>=0A>> Thanks for the help!=0A>>=0A>> My conf files are : Hadoop:=0A>>= hdfs-site=0A>>=0A>> =0A>>=A0 =0A>>=A0 dfs.= replication=0A>>=A0 3>=A0 Default b= lock replication.=0A>>=A0 The actual number of replications can be specifi= ed when the file is=0A>> created.=0A>>=A0 The default is used if replicati= on is not specified in create time.=0A>>=A0 =0A>> =0A>> =0A>>=A0 dfs.data.dir=0A>>=A0 /home/= runner/app/hadoop/dfs/data=0A>>=A0 Default block repl= ication.=0A>>=A0 The actual number of replications can be specified when t= he file is=0A>> created.=0A>>=A0 The default is used if replication is not= specified in create time.=0A>>=A0 =0A>> =0A>> =0A>>=A0 =A0 =A0 =A0 dfs.datanode.max.xcievers=0A>>= =A0 =A0 =A0 =A0 4096=0A>>=A0 =A0 =A0 =0A>> =0A>>=0A>>=0A>> Mapred-site.xml=0A>>=0A>> =0A>>= =A0 =0A>>=A0 mapred.job.tracker=0A>>=A0 mas= ter:54311=0A>>=A0 The host and port that the MapReduc= e job tracker runs=0A>>=A0 at.=A0 If "local", then jobs are run in-process= as a single map=0A>>=A0 and reduce task.=0A>>=A0 =0A>> =0A>> =0A>>=A0 mapred.tasktracker.map.tasks.maximu= m=0A>>=A0 14=0A>>=A0 The maximum numbe= r of map tasks that will be run=0A>>=A0 simultaneously by a task tracker.= =0A>>=A0 =0A>> =0A>>=0A>> =0A>>=A0 mapred.tasktracker.reduce.tasks.maximum=0A>>=A0 14=0A>>=A0 The maximum number of reduce tasks that will be run= =0A>>=A0 simultaneously by a task tracker.=0A>>=A0 =0A>> =0A>> =0A>> mapred.child.java.opts=0A>>=A0 = -Xmx400m=0A>>=A0 Java opts for the task track= er child processes.=0A>>=A0 The following symbol, if present, will be inte= rpolated: @taskid@ is=0A>> replaced=0A>>=A0 by current TaskID. Any other o= ccurrences of '@' will go unchanged.=0A>>=A0 For example, to enable verbos= e gc logging to a file named for the taskid=0A>> in=0A>>=A0 /tmp and to se= t the heap maximum to be a gigabyte, pass a 'value' of:=0A>>=A0 =A0 =A0 =A0= -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc=0A>>=0A>>=A0 The configur= ation variable mapred.child.ulimit can be used to control the=0A>>=A0 maxi= mum virtual memory of the child processes.=0A>>=A0 =0A>> =0A>> =0A>>=0A>>=0A>> core-site.xml=0A>>=0A>> =0A>>=A0 =0A>>=A0 hadoop.tmp.dir=0A>>=A0 = /home/runner/app/hadoop/tmp=0A>>=A0 A base fo= r other temporary directories.=0A>> =0A>>=0A>> =0A>>=A0 fs.default.name=0A>>=A0 hdfs://master= :54310=0A>>=A0 The name of the default file system.= =A0 A URI whose=0A>>=A0 scheme and authority determine the FileSystem impl= ementation.=A0 The=0A>>=A0 uri's scheme determines the config property (fs= .SCHEME.impl) naming=0A>>=A0 the FileSystem implementation class.=A0 The u= ri's authority is used to=0A>>=A0 determine the host, port, etc. for a fil= esystem.=0A>> =0A>>=0A>>=0A>> For Hbase:=0A>> hbas= e-site:=0A>> =0A>>=A0 =0A>>=A0 =A0 hbase.ro= otdir=0A>>=A0 =A0 hdfs://master:54310/hbase=0A>>=A0 = =0A>>=A0 =0A>>=A0 =A0 hbase.cluster.distribute= d=0A>>=A0 =A0 true=0A>>=A0 =A0 The mod= e the cluster will be in. Possible values are=0A>>=A0 =A0 =A0 false: stand= alone and pseudo-distributed setups with managed=0A>> Zookeeper=0A>>=A0 =A0= =A0 true: fully-distributed with unmanaged Zookeeper Quorum (see=0A>> hba= se-env.sh)=0A>>=A0 =A0 =0A>>=A0 =0A>> = =0A>>=A0 =A0 =A0 =A0 hbase.zookeeper.property.clientPort=0A>>= =A0 =A0 =A0 =A0 2222=0A>>=A0 =A0 =0A>>=A0 =A0 = =0A>>=A0 =A0 =A0 =A0 hbase.zookeeper.quorum=0A>>=A0= =A0 =A0 =A0 ks25937.kimsufi.com=0A>>=A0 =A0 = =0A>>=A0 =A0 =0A>>=A0 =A0 =A0 =A0 hbase.zookeeper.propert= y.dataDir=0A>>=A0 =A0 =A0 =A0 /home/runner/hbase/hbase-0.94.= 2/tmp=0A>>=A0 =A0 =0A>> =0A>>=0A>>=0A>>= =0A>>=0A>> I am currently running import and looking at the logs to try and= understand=0A>> This seems definitely phishy:=0A>>=0A>> 2012-10-23 18:39:4= 9,107 INFO org.apache.hadoop.mapred.TaskTracker:=0A>> attempt_201210231145_= 0010_m_000041_0 0.21332978%=0A>> 2012-10-23 18:39:50,363 INFO org.apache.ha= doop.mapred.TaskTracker:=0A>> attempt_201210231145_0010_m_000028_0 0.209368= 84%=0A>> 2012-10-23 18:49:38,098 INFO org.apache.hadoop.mapred.TaskTracker:= =0A>> attempt_201210231145_0010_m_000030_0: Task=0A>> attempt_201210231145_= 0010_m_000030_0=0A>> failed to report status for 602 seconds. Killing!=0A>>= 2012-10-23 18:49:38,116 INFO org.apache.hadoop.mapred.TaskTracker: Process= =0A>> Thread Dump: lost task=0A>> 90 active threads=0A>> Thread 742 (proces= s reaper):=0A>>=A0 State: RUNNABLE=0A>>=A0 Blocked count: 0=0A>>=A0 Wait= ed count: 0=0A>>=A0 Stack:=0A>>=A0 =A0 java.lang.UNIXProcess.waitForProce= ssExit(Native Method)=0A>>=A0 =A0 java.lang.UNIXProcess.access$200(UNIXPro= cess.java:54)=0A>>=A0 =A0 java.lang.UNIXProcess$3.run(UNIXProcess.java:174= )=0A>>=0A>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExe= cutor.java:1110)=0A>>=0A>> java.util.concurrent.ThreadPoolExecutor$Worker.r= un(ThreadPoolExecutor.java:603)=0A>>=A0 =A0 java.lang.Thread.run(Thread.ja= va:722)=0A>> Thread 740 (process reaper):=0A>>=A0 State: RUNNABLE=0A>>=A0 = Blocked count: 0=0A>>=A0 Waited count: 0=0A>>=A0 Stack:=0A>>=A0 =A0 jav= a.lang.UNIXProcess.waitForProcessExit(Native Method)=0A>>=A0 =A0 java.lang= .UNIXProcess.access$200(UNIXProcess.java:54)=0A>>=A0 =A0 java.lang.UNIXPro= cess$3.run(UNIXProcess.java:174)=0A>>=0A>>=0A=0A=0A=0A-- =0AKevin O'Dell=0A= Customer Operations Engineer, Cloudera --1001534069-418957583-1351138230=:41525--