Return-Path:
X-Original-To: apmail-hbase-user-archive@www.apache.org
Delivered-To: apmail-hbase-user-archive@www.apache.org
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
by minotaur.apache.org (Postfix) with SMTP id B3386900D
for ;
Thu, 25 Oct 2012 04:11:01 +0000 (UTC)
Received: (qmail 24780 invoked by uid 500); 25 Oct 2012 04:10:59 -0000
Delivered-To: apmail-hbase-user-archive@hbase.apache.org
Received: (qmail 24678 invoked by uid 500); 25 Oct 2012 04:10:59 -0000
Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: user@hbase.apache.org
Delivered-To: mailing list user@hbase.apache.org
Received: (qmail 24667 invoked by uid 99); 25 Oct 2012 04:10:59 -0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 04:10:59 +0000
X-ASF-Spam-Status: No, hits=2.2 required=5.0
tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS
X-Spam-Check-By: apache.org
Received-SPF: pass (athena.apache.org: local policy)
Received: from [98.138.91.60] (HELO nm22-vm0.bullet.mail.ne1.yahoo.com)
(98.138.91.60)
by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 04:10:52 +0000
Received: from [98.138.226.179] by nm22.bullet.mail.ne1.yahoo.com with NNFMP;
25 Oct 2012 04:10:31 -0000
Received: from [98.138.226.160] by tm14.bullet.mail.ne1.yahoo.com with NNFMP;
25 Oct 2012 04:10:31 -0000
Received: from [127.0.0.1] by omp1061.mail.ne1.yahoo.com with NNFMP;
25 Oct 2012 04:10:31 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 619756.40199.bm@omp1061.mail.ne1.yahoo.com
Received: (qmail 41726 invoked by uid 60001); 25 Oct 2012 04:10:31 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024;
t=1351138231; bh=xvO0Dhgv/JZFok8Sp1MFELPQMOZkQpS3qKDhGkYxWwE=;
h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type;
b=qFq2VFyKSfckvOjvYvPOGI7mQjM9341B4eweOrvCgGgMtdgFNBzxkpGDEN+7nJ2mn9KBizVb1PdUk+RlOvapbaeFUA2gHu9gr9wooJB6eqEclMdmrHX55o1V5TEzUX+xASzCLZ0ur/F2agNQUiOTHzx86P3EhxmMfa9Z+VR5R4Y=
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
s=s1024; d=yahoo.com;
h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type;
b=XgFp3nMBjVtIW9DMXtHFztrsKFNDzeBSZx9epGDwF4O2fI07ecS6YQ1M0tDD+18AcXQUaxvSqr6Qc/XpR1fenHWYDGTK7Svf2eV6PYY9sO5tz2LIY4rx/o/1emvfXL7zS2Ku56DqzmNnVKM/OvZoUvM5L1Z23OrcIemUoyo7U1E=;
X-YMail-OSG: sxE23boVM1k9nW5UEe2gV8MO1CNCizosi84VtXf9bjnSe_Y
yJY4NWa.7gmbqjzAJGn06RYTARK8WmUt6MH5rfe1mx2CHmilVewZ2KMvzjOM
Xj6Ev1Afe41qi48iEa_seBiMuBgFeAp6lXf48yGxacnHNtQCQCmCYSo24DhL
VmumON5erP7Sh5jvUDkCnVWVGKMetsdCh3CvWUbn9fA0mAbuPp2ZTOZxKhKQ
zv3DlU3SDcFCgQRSIr3FnEH7NcFOoELU_v2KZJDTIE7zZmVw_s87uwhWJ24E
m0u7ecGr7BQRimiBGSOtbSnF7GGB2UHydPpFC83TQEnkfnE.6OY1ZDA79iMD
AAePbRR0W85cs.GHfM7BKTpmWhaMmpuIoixWrwDcK7C.ge8T5oo09Sg4Ullc
7UmLsw1_UHpOGk7.PUd4g3G8uGpFJxUDXMoSPh.auIP_RJlnMsPkKEB_vw1c
ON7qSo4gAEKPQ7xU.30WmNNUBPRKzthghqiOpSOvSRD_oj0S3qtTvBzOtKJg
1oxjLGOsXm.JoT77aDUCMn55aZe6ggnMt6fRO5mjozBAG19kEGn3j34rNrxA
SKSLlWGS7uWHJkHlaXmCIJN5.HS5tN92KBsc2LozkMizPS3HA33627wyCD7J
QeBhgkrWM0gtnzjiICK8Sh1zDkJYbXGmhNt0Wwv6Z_cf.aA--
Received: from [107.3.190.75] by web121703.mail.ne1.yahoo.com via HTTP;
Wed, 24 Oct 2012 21:10:30 PDT
X-Rocket-MIMEInfo:
001.001,VGhpcyBpcyBnb29kIGFkdmljZSBLZXZpbiB3ZSBzaG91bGQgYWRkIHRoaXMgdG8gdGhlIEhCYXNlIFJlZmVyZW5jZSBHdWlkZS4KCgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KIEZyb206IEtldmluIE8nZGVsbCA8a2V2aW4ub2RlbGxAY2xvdWRlcmEuY29tPgpUbzogdXNlckBoYmFzZS5hcGFjaGUub3JnIApTZW50OiBUdWVzZGF5LCBPY3RvYmVyIDIzLCAyMDEyIDEwOjQ3IEFNClN1YmplY3Q6IFJlOiBIYmFzZSBpbXBvcnQgVHN2IHBlcmZvcm1hbmNlIChzbG93IGltcG9ydCkKIApZb3Ugd2kBMAEBAQE-
X-Mailer: YahooMailWebService/0.8.123.460
References:
Message-ID: <1351138230.41525.YahooMailNeo@web121703.mail.ne1.yahoo.com>
Date: Wed, 24 Oct 2012 21:10:30 -0700 (PDT)
From: lars hofhansl
Reply-To: lars hofhansl
Subject: Re: Hbase import Tsv performance (slow import)
To: "user@hbase.apache.org"
In-Reply-To:
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="1001534069-418957583-1351138230=:41525"
X-Virus-Checked: Checked by ClamAV on apache.org
--1001534069-418957583-1351138230=:41525
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
This is good advice Kevin we should add this to the HBase Reference Guide.=
=0A=0A=0A=0A________________________________=0A From: Kevin O'dell =0ATo: user@hbase.apache.org =0ASent: Tuesday, October 23=
, 2012 10:47 AM=0ASubject: Re: Hbase import Tsv performance (slow import)=
=0A =0AYou will want to make sure your table is pre-split.=A0 Also Import d=
oes=0Aputs, so you will want to make sure you are not flushing and blocking=
=0Aby raising your memstore, Hlog, and blocking count.=A0 This can greatly=
=0Aimprove your write speeds.=A0 I usually do a 256MB memstore(you can=0Alo=
wer it later if it is not a heavy writes table), 512MB Hlog(same=0Athing, y=
ou can lower back to default), and then raise the storefile=0Ablocking coun=
t to about 100.=0A=0AOn Tue, Oct 23, 2012 at 1:32 PM, Nicolas Liochon wrote:=0A> Thanks, checking the schema itself is still inter=
esting (cf. the link sent)=0A> As well, with 3 machines and a replication f=
actor of 3, all the machines=0A> are used during a write. As HBase writes a=
ll entries into a write-ahead-log=0A> for safety, the number of writes is a=
lso doubled. So may be your machine is=0A> just dying under the load. Anywa=
y, here your cluster is going at the speed=0A> of the least powerful machin=
e, and this machine has a workload multiplied=0A> by 6 compared to a single=
machine config (i.e. just writing a file locally).=0A>=0A> On Tue, Oct 23,=
2012 at 7:13 PM, Nick maillard <=0A> nicolas.maillard@fifty-five.com> wrot=
e:=0A>=0A>> Thanks for the help!=0A>>=0A>> My conf files are : Hadoop:=0A>>=
hdfs-site=0A>>=0A>> =0A>>=A0 =0A>>=A0 dfs.=
replication=0A>>=A0 3>=A0 Default b=
lock replication.=0A>>=A0 The actual number of replications can be specifi=
ed when the file is=0A>> created.=0A>>=A0 The default is used if replicati=
on is not specified in create time.=0A>>=A0 =0A>> =0A>> =0A>>=A0 dfs.data.dir=0A>>=A0 /home/=
runner/app/hadoop/dfs/data=0A>>=A0 Default block repl=
ication.=0A>>=A0 The actual number of replications can be specified when t=
he file is=0A>> created.=0A>>=A0 The default is used if replication is not=
specified in create time.=0A>>=A0 =0A>> =0A>> =0A>>=A0 =A0 =A0 =A0 dfs.datanode.max.xcievers=0A>>=
=A0 =A0 =A0 =A0 4096=0A>>=A0 =A0 =A0 =0A>> =0A>>=0A>>=0A>> Mapred-site.xml=0A>>=0A>> =0A>>=
=A0 =0A>>=A0 mapred.job.tracker=0A>>=A0 mas=
ter:54311=0A>>=A0 The host and port that the MapReduc=
e job tracker runs=0A>>=A0 at.=A0 If "local", then jobs are run in-process=
as a single map=0A>>=A0 and reduce task.=0A>>=A0 =0A>>
=0A>> =0A>>=A0 mapred.tasktracker.map.tasks.maximu=
m=0A>>=A0 14=0A>>=A0 The maximum numbe=
r of map tasks that will be run=0A>>=A0 simultaneously by a task tracker.=
=0A>>=A0 =0A>> =0A>>=0A>> =0A>>=A0 mapred.tasktracker.reduce.tasks.maximum=0A>>=A0 14=0A>>=A0 The maximum number of reduce tasks that will be run=
=0A>>=A0 simultaneously by a task tracker.=0A>>=A0 =0A>> =
property>=0A>> =0A>> mapred.child.java.opts=0A>>=A0 =
-Xmx400m=0A>>=A0 Java opts for the task track=
er child processes.=0A>>=A0 The following symbol, if present, will be inte=
rpolated: @taskid@ is=0A>> replaced=0A>>=A0 by current TaskID. Any other o=
ccurrences of '@' will go unchanged.=0A>>=A0 For example, to enable verbos=
e gc logging to a file named for the taskid=0A>> in=0A>>=A0 /tmp and to se=
t the heap maximum to be a gigabyte, pass a 'value' of:=0A>>=A0 =A0 =A0 =A0=
-Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc=0A>>=0A>>=A0 The configur=
ation variable mapred.child.ulimit can be used to control the=0A>>=A0 maxi=
mum virtual memory of the child processes.=0A>>=A0 =0A>>
=0A>> =0A>>=0A>>=0A>> core-site.xml=0A>>=0A>> =0A>>=A0 =0A>>=A0 hadoop.tmp.dir=0A>>=A0 =
/home/runner/app/hadoop/tmp=0A>>=A0 A base fo=
r other temporary directories.=0A>> =0A>>=0A>> =0A>>=A0 fs.default.name=0A>>=A0 hdfs://master=
:54310=0A>>=A0 The name of the default file system.=
=A0 A URI whose=0A>>=A0 scheme and authority determine the FileSystem impl=
ementation.=A0 The=0A>>=A0 uri's scheme determines the config property (fs=
.SCHEME.impl) naming=0A>>=A0 the FileSystem implementation class.=A0 The u=
ri's authority is used to=0A>>=A0 determine the host, port, etc. for a fil=
esystem.=0A>> =0A>>=0A>>=0A>> For Hbase:=0A>> hbas=
e-site:=0A>> =0A>>=A0 =0A>>=A0 =A0 hbase.ro=
otdir=0A>>=A0 =A0 hdfs://master:54310/hbase=0A>>=A0 =
=0A>>=A0 =0A>>=A0 =A0 hbase.cluster.distribute=
d=0A>>=A0 =A0 true=0A>>=A0 =A0 The mod=
e the cluster will be in. Possible values are=0A>>=A0 =A0 =A0 false: stand=
alone and pseudo-distributed setups with managed=0A>> Zookeeper=0A>>=A0 =A0=
=A0 true: fully-distributed with unmanaged Zookeeper Quorum (see=0A>> hba=
se-env.sh)=0A>>=A0 =A0 =0A>>=A0 =0A>> =
=0A>>=A0 =A0 =A0 =A0 hbase.zookeeper.property.clientPort=0A>>=
=A0 =A0 =A0 =A0 2222=0A>>=A0 =A0 =0A>>=A0 =A0 =
=0A>>=A0 =A0 =A0 =A0 hbase.zookeeper.quorum=0A>>=A0=
=A0 =A0 =A0 ks25937.kimsufi.com=0A>>=A0 =A0 =
=0A>>=A0 =A0 =0A>>=A0 =A0 =A0 =A0 hbase.zookeeper.propert=
y.dataDir=0A>>=A0 =A0 =A0 =A0 /home/runner/hbase/hbase-0.94.=
2/tmp=0A>>=A0 =A0 =0A>> =0A>>=0A>>=0A>>=
=0A>>=0A>> I am currently running import and looking at the logs to try and=
understand=0A>> This seems definitely phishy:=0A>>=0A>> 2012-10-23 18:39:4=
9,107 INFO org.apache.hadoop.mapred.TaskTracker:=0A>> attempt_201210231145_=
0010_m_000041_0 0.21332978%=0A>> 2012-10-23 18:39:50,363 INFO org.apache.ha=
doop.mapred.TaskTracker:=0A>> attempt_201210231145_0010_m_000028_0 0.209368=
84%=0A>> 2012-10-23 18:49:38,098 INFO org.apache.hadoop.mapred.TaskTracker:=
=0A>> attempt_201210231145_0010_m_000030_0: Task=0A>> attempt_201210231145_=
0010_m_000030_0=0A>> failed to report status for 602 seconds. Killing!=0A>>=
2012-10-23 18:49:38,116 INFO org.apache.hadoop.mapred.TaskTracker: Process=
=0A>> Thread Dump: lost task=0A>> 90 active threads=0A>> Thread 742 (proces=
s reaper):=0A>>=A0 State: RUNNABLE=0A>>=A0 Blocked count: 0=0A>>=A0 Wait=
ed count: 0=0A>>=A0 Stack:=0A>>=A0 =A0 java.lang.UNIXProcess.waitForProce=
ssExit(Native Method)=0A>>=A0 =A0 java.lang.UNIXProcess.access$200(UNIXPro=
cess.java:54)=0A>>=A0 =A0 java.lang.UNIXProcess$3.run(UNIXProcess.java:174=
)=0A>>=0A>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExe=
cutor.java:1110)=0A>>=0A>> java.util.concurrent.ThreadPoolExecutor$Worker.r=
un(ThreadPoolExecutor.java:603)=0A>>=A0 =A0 java.lang.Thread.run(Thread.ja=
va:722)=0A>> Thread 740 (process reaper):=0A>>=A0 State: RUNNABLE=0A>>=A0 =
Blocked count: 0=0A>>=A0 Waited count: 0=0A>>=A0 Stack:=0A>>=A0 =A0 jav=
a.lang.UNIXProcess.waitForProcessExit(Native Method)=0A>>=A0 =A0 java.lang=
.UNIXProcess.access$200(UNIXProcess.java:54)=0A>>=A0 =A0 java.lang.UNIXPro=
cess$3.run(UNIXProcess.java:174)=0A>>=0A>>=0A=0A=0A=0A-- =0AKevin O'Dell=0A=
Customer Operations Engineer, Cloudera
--1001534069-418957583-1351138230=:41525--