Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of mohandes.zebeleh.67@gmail.com
 designates 209.85.215.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALte62zq8p75boFx_HGE-a22vy3roX5HA3ciyvteDoaAxJk_PQ@mail.gmail.com>
References: <udhb2os86oet6awhxnyasahc.1358069509536@email.android.com>
	<CALte62zq8p75boFx_HGE-a22vy3roX5HA3ciyvteDoaAxJk_PQ@mail.gmail.com>
Date: Mon, 14 Jan 2013 09:28:26 +0330
Message-ID: 
 <CAJdcUW0mcu-XFPnw2A6g_NA0_Dzo54z5pkc35HgjQXLdzy3QFQ@mail.gmail.com>
Subject: Re: Tune MapReduce over HBase to insert data
From: Farrokh Shahriari <mohandes.zebeleh.67@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=f46d040168dd73b09104d33955f6

--f46d040168dd73b09104d33955f6
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Bing Jiang, What do you mean by add compaction thread number ? Because, in
Hbase-site.xml we have compactionqueuesize or compactionthreshold but not
the parameter that you have said.

Thanks you if you guide me.

On Sun, Jan 13, 2013 at 7:00 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce package=
.
>
> Cheers
>
> On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang <jiangbinglover@gmail.com
> >wrote:
>
> > hi,anoop.
> > Why not hbase mapreduce package contains the tools like this?
> >
> > Anoop John <anoop.hbase@gmail.com>=E7=BC=96=E5=86=99=EF=BC=9A
> >
> > >Hi
> > >             Can you think of using HFileOutputFormat ?  Here you use
> > >TableOutputFormat now. There will be put calls to HTable. Instead in
> > >HFileOutput format the MR will write the HFiles directly.[No flushes ,
> > >compactions] Later using LoadIncrementalHFiles need to load the HFiles
> to
> > >the regions.  May help you..
> > >
> > >-Anoop-
> > >
> > >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari <
> > >mohandes.zebeleh.67@gmail.com> wrote:
> > >
> > >> Thank you guys,let me change these configuration & test mapreduce
> again.
> > >>
> > >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <asaf.mesika@gmail.com>
> > >> wrote:
> > >>
> > >> > Start by testing HDFS throughput by doing s simple copyFromLocal
> using
> > >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal
> pathTo8GBFile
> > >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the
> > computers,
> > >> > you should get around 75 MB/sec.
> > >> >
> > >> > On Tuesday, January 8, 2013, Bing Jiang wrote:
> > >> >
> > >> > > In our experience, it can enhance mapreduce insert by
> > >> > > 1.add regionserver flush thread number
> > >> > > 2.add memstore/jvm_heap
> > >> > > 3.pre split table region before mapreduce
> > >> > > 4.add large and small compaction thread number.
> > >> > >
> > >> > > please correct me if wrong, or any other better ideas.
> > >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" <larsh@apache.org
> > >> <javascript:;>>
> > >> > > wrote:
> > >> > >
> > >> > > > What type of disks and how many?
> > >> > > > With the default replication factor your 2 (or 6) GB are
> actually
> > >> > > > replicated 3 times.
> > >> > > > 6GB/80s =3D 75MB/s, twice that if you do not disable the WAL,
> which
> > a
> > >> > > > reasonable machine should be able to absorb.
> > >> > > > The fact that deferred log flush does not help you seems to
> > indicate
> > >> > that
> > >> > > > you're over IO bound.
> > >> > > >
> > >> > > >
> > >> > > > What's your memstore flush size? Potentially the data is writt=
en
> > many
> > >> > > > times during compactions.
> > >> > > >
> > >> > > >
> > >> > > > In your case you dial down the HDFS replication, since you onl=
y
> > have
> > >> > two
> > >> > > > physical machines anyway.
> > >> > > > (Set it to 2. If you do not specify any failure zones, you mig=
ht
> > as
> > >> > well
> > >> > > > set it to 1... You will lose data if one of your server machin=
es
> > dies
> > >> > > > anyway).
> > >> > > >
> > >> > > > It does not really make that much sense to deploy HBase and HD=
FS
> > on
> > >> > > > virtual nodes like this.
> > >> > > > -- Lars
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > ________________________________
> > >> > > >  From: Farrokh Shahriari <mohandes.zebeleh.67@gmail.com
> > >> <javascript:;>>
> > >> > > > To: user@hbase.apache.org <javascript:;>
> > >> > > > Sent: Monday, January 7, 2013 9:38 PM
> > >> > > > Subject: Re: Tune MapReduce over HBase to insert data
> > >> > > >
> > >> > > > Hi again,
> > >> > > > I'm using HBase 0.92.1-cdh4.0.0.
> > >> > > > I have two server machine with 48Gb RAM,12 physical core & 24
> > logical
> > >> > > core
> > >> > > > that contain 12 nodes(6 nodes on each server). Each node has 8=
Gb
> > RAM
> > >> &
> > >> > 2
> > >> > > > VCPU.
> > >> > > > I've set some parameter that get better result like set WAL=3D=
off
> on
> > >> > > put,but
> > >> > > > some parameters like Heap-size,Deferred log flush don't help m=
e.
> > >> > > > Beside that I have another question,why each time I've run
> > >> > mapreduce,I've
> > >> > > > got different result time while all the config & hardware are
> > same &
> > >> > not
> > >> > > > change ?
> > >> > > >
> > >> > > > Tnx you guys
> > >> > > >
> > >> > > > On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <yuzhihong@gmail.com
> > >> > <javascript:;>>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Have you read through
> > >> http://hbase.apache.org/book.html#performance?
> > >> > > > >
> > >> > > > > What version of HBase are you using ?
> > >> > > > >
> > >> > > > > Cheers
> > >> > > > >
> > >> > > > > On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari <
> > >> > > > > mohandes.zebeleh.67@gmail.com <javascript:;>> wrote:
> > >> > > > >
> > >> > > > > > Hi there
> > >> > > > > > I have a cluster with 12 nodes that each of them has 2 cor=
e
> of
> > >> CPU.
> > >> > > > Now,I
> > >> > > > > > want insert large data about 2Gb in 80 sec ( or 6Gb in
> 240sec
> > ).
> > >> > I've
> > >> > > > > used
> > >> > > > > > Map-Reduce over hbase,but I can't achieve proper result .
> > >> > > > > > I'd be glad if you tell me what I can do to get better
> result
> > or
> > >> > > which
> > >> > > > > > parameters should I config or tune to improve
> Map-Reduce/Hbase
> > >> > > > > performance
> > >> > > > > > ?
> > >> > > > > >
> > >> > > > > > Tnx
> > >> > > > > >
> > >> > > > >
> > >> > >
> > >> >
> > >>
> >
>

--f46d040168dd73b09104d33955f6--