Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A6D47E0C7 for ; Mon, 14 Jan 2013 05:58:59 +0000 (UTC) Received: (qmail 6138 invoked by uid 500); 14 Jan 2013 05:58:57 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 6067 invoked by uid 500); 14 Jan 2013 05:58:57 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 6025 invoked by uid 99); 14 Jan 2013 05:58:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jan 2013 05:58:55 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mohandes.zebeleh.67@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-la0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jan 2013 05:58:49 +0000 Received: by mail-la0-f44.google.com with SMTP id fr10so3473288lab.31 for ; Sun, 13 Jan 2013 21:58:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=YGhaNDr7e5V0iYsPhql4xL69gcte+ydCzMqTrEtUyFY=; b=THFE9uJUfRStihN641BtCUkjCTNR8FpOt0CMXMAGmt0drjiyQ2SmzGsLD08JCdgwPQ WaeNAVI3ciVdcqA0lBdni/wu0PDINnvPDHiKvpd2Hm0gg0LipbG3ILmGYzJlcIQ4kPMs JUkPTyY9jZZhKF1sGS0FfzYjx6GtUU9yO26h8HoijinSmxIxSxWM8CuDkyoHRtIUgdo6 W7VMn2V2RP91boigR/zCBPggyLqIhbVMe29BLyaWZbzuYitdYU98UtZsUyidotM+z4z9 e7tkSZi8O+ZZwXtfckNO52gKBCbsbwiRzS6DmoHtCNu2W4Rndg8kbglcv6u13OpyseSf dElA== MIME-Version: 1.0 Received: by 10.112.14.6 with SMTP id l6mr29820238lbc.81.1358143107084; Sun, 13 Jan 2013 21:58:27 -0800 (PST) Received: by 10.114.6.35 with HTTP; Sun, 13 Jan 2013 21:58:26 -0800 (PST) In-Reply-To: References: Date: Mon, 14 Jan 2013 09:28:26 +0330 Message-ID: Subject: Re: Tune MapReduce over HBase to insert data From: Farrokh Shahriari To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=f46d040168dd73b09104d33955f6 X-Virus-Checked: Checked by ClamAV on apache.org --f46d040168dd73b09104d33955f6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Bing Jiang, What do you mean by add compaction thread number ? Because, in Hbase-site.xml we have compactionqueuesize or compactionthreshold but not the parameter that you have said. Thanks you if you guide me. On Sun, Jan 13, 2013 at 7:00 PM, Ted Yu wrote: > Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce package= . > > Cheers > > On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang >wrote: > > > hi,anoop. > > Why not hbase mapreduce package contains the tools like this? > > > > Anoop John =E7=BC=96=E5=86=99=EF=BC=9A > > > > >Hi > > > Can you think of using HFileOutputFormat ? Here you use > > >TableOutputFormat now. There will be put calls to HTable. Instead in > > >HFileOutput format the MR will write the HFiles directly.[No flushes , > > >compactions] Later using LoadIncrementalHFiles need to load the HFiles > to > > >the regions. May help you.. > > > > > >-Anoop- > > > > > >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari < > > >mohandes.zebeleh.67@gmail.com> wrote: > > > > > >> Thank you guys,let me change these configuration & test mapreduce > again. > > >> > > >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika > > >> wrote: > > >> > > >> > Start by testing HDFS throughput by doing s simple copyFromLocal > using > > >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal > pathTo8GBFile > > >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the > > computers, > > >> > you should get around 75 MB/sec. > > >> > > > >> > On Tuesday, January 8, 2013, Bing Jiang wrote: > > >> > > > >> > > In our experience, it can enhance mapreduce insert by > > >> > > 1.add regionserver flush thread number > > >> > > 2.add memstore/jvm_heap > > >> > > 3.pre split table region before mapreduce > > >> > > 4.add large and small compaction thread number. > > >> > > > > >> > > please correct me if wrong, or any other better ideas. > > >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" > >> > > > >> > > wrote: > > >> > > > > >> > > > What type of disks and how many? > > >> > > > With the default replication factor your 2 (or 6) GB are > actually > > >> > > > replicated 3 times. > > >> > > > 6GB/80s =3D 75MB/s, twice that if you do not disable the WAL, > which > > a > > >> > > > reasonable machine should be able to absorb. > > >> > > > The fact that deferred log flush does not help you seems to > > indicate > > >> > that > > >> > > > you're over IO bound. > > >> > > > > > >> > > > > > >> > > > What's your memstore flush size? Potentially the data is writt= en > > many > > >> > > > times during compactions. > > >> > > > > > >> > > > > > >> > > > In your case you dial down the HDFS replication, since you onl= y > > have > > >> > two > > >> > > > physical machines anyway. > > >> > > > (Set it to 2. If you do not specify any failure zones, you mig= ht > > as > > >> > well > > >> > > > set it to 1... You will lose data if one of your server machin= es > > dies > > >> > > > anyway). > > >> > > > > > >> > > > It does not really make that much sense to deploy HBase and HD= FS > > on > > >> > > > virtual nodes like this. > > >> > > > -- Lars > > >> > > > > > >> > > > > > >> > > > > > >> > > > ________________________________ > > >> > > > From: Farrokh Shahriari > >> > > > >> > > > To: user@hbase.apache.org > > >> > > > Sent: Monday, January 7, 2013 9:38 PM > > >> > > > Subject: Re: Tune MapReduce over HBase to insert data > > >> > > > > > >> > > > Hi again, > > >> > > > I'm using HBase 0.92.1-cdh4.0.0. > > >> > > > I have two server machine with 48Gb RAM,12 physical core & 24 > > logical > > >> > > core > > >> > > > that contain 12 nodes(6 nodes on each server). Each node has 8= Gb > > RAM > > >> & > > >> > 2 > > >> > > > VCPU. > > >> > > > I've set some parameter that get better result like set WAL=3D= off > on > > >> > > put,but > > >> > > > some parameters like Heap-size,Deferred log flush don't help m= e. > > >> > > > Beside that I have another question,why each time I've run > > >> > mapreduce,I've > > >> > > > got different result time while all the config & hardware are > > same & > > >> > not > > >> > > > change ? > > >> > > > > > >> > > > Tnx you guys > > >> > > > > > >> > > > On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu > >> > > > > >> > > wrote: > > >> > > > > > >> > > > > Have you read through > > >> http://hbase.apache.org/book.html#performance? > > >> > > > > > > >> > > > > What version of HBase are you using ? > > >> > > > > > > >> > > > > Cheers > > >> > > > > > > >> > > > > On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari < > > >> > > > > mohandes.zebeleh.67@gmail.com > wrote: > > >> > > > > > > >> > > > > > Hi there > > >> > > > > > I have a cluster with 12 nodes that each of them has 2 cor= e > of > > >> CPU. > > >> > > > Now,I > > >> > > > > > want insert large data about 2Gb in 80 sec ( or 6Gb in > 240sec > > ). > > >> > I've > > >> > > > > used > > >> > > > > > Map-Reduce over hbase,but I can't achieve proper result . > > >> > > > > > I'd be glad if you tell me what I can do to get better > result > > or > > >> > > which > > >> > > > > > parameters should I config or tune to improve > Map-Reduce/Hbase > > >> > > > > performance > > >> > > > > > ? > > >> > > > > > > > >> > > > > > Tnx > > >> > > > > > > > >> > > > > > > >> > > > > >> > > > >> > > > --f46d040168dd73b09104d33955f6--