hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Tune MapReduce over HBase to insert data
Date Sun, 13 Jan 2013 15:30:45 GMT
Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce package.

Cheers

On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang <jiangbinglover@gmail.com>wrote:

> hi,anoop.
> Why not hbase mapreduce package contains the tools like this?
>
> Anoop John <anoop.hbase@gmail.com>编写:
>
> >Hi
> >             Can you think of using HFileOutputFormat ?  Here you use
> >TableOutputFormat now. There will be put calls to HTable. Instead in
> >HFileOutput format the MR will write the HFiles directly.[No flushes ,
> >compactions] Later using LoadIncrementalHFiles need to load the HFiles to
> >the regions.  May help you..
> >
> >-Anoop-
> >
> >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari <
> >mohandes.zebeleh.67@gmail.com> wrote:
> >
> >> Thank you guys,let me change these configuration & test mapreduce again.
> >>
> >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <asaf.mesika@gmail.com>
> >> wrote:
> >>
> >> > Start by testing HDFS throughput by doing s simple copyFromLocal using
> >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal pathTo8GBFile
> >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the
> computers,
> >> > you should get around 75 MB/sec.
> >> >
> >> > On Tuesday, January 8, 2013, Bing Jiang wrote:
> >> >
> >> > > In our experience, it can enhance mapreduce insert by
> >> > > 1.add regionserver flush thread number
> >> > > 2.add memstore/jvm_heap
> >> > > 3.pre split table region before mapreduce
> >> > > 4.add large and small compaction thread number.
> >> > >
> >> > > please correct me if wrong, or any other better ideas.
> >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" <larsh@apache.org
> >> <javascript:;>>
> >> > > wrote:
> >> > >
> >> > > > What type of disks and how many?
> >> > > > With the default replication factor your 2 (or 6) GB are actually
> >> > > > replicated 3 times.
> >> > > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL, which
> a
> >> > > > reasonable machine should be able to absorb.
> >> > > > The fact that deferred log flush does not help you seems to
> indicate
> >> > that
> >> > > > you're over IO bound.
> >> > > >
> >> > > >
> >> > > > What's your memstore flush size? Potentially the data is written
> many
> >> > > > times during compactions.
> >> > > >
> >> > > >
> >> > > > In your case you dial down the HDFS replication, since you only
> have
> >> > two
> >> > > > physical machines anyway.
> >> > > > (Set it to 2. If you do not specify any failure zones, you might
> as
> >> > well
> >> > > > set it to 1... You will lose data if one of your server machines
> dies
> >> > > > anyway).
> >> > > >
> >> > > > It does not really make that much sense to deploy HBase and HDFS
> on
> >> > > > virtual nodes like this.
> >> > > > -- Lars
> >> > > >
> >> > > >
> >> > > >
> >> > > > ________________________________
> >> > > >  From: Farrokh Shahriari <mohandes.zebeleh.67@gmail.com
> >> <javascript:;>>
> >> > > > To: user@hbase.apache.org <javascript:;>
> >> > > > Sent: Monday, January 7, 2013 9:38 PM
> >> > > > Subject: Re: Tune MapReduce over HBase to insert data
> >> > > >
> >> > > > Hi again,
> >> > > > I'm using HBase 0.92.1-cdh4.0.0.
> >> > > > I have two server machine with 48Gb RAM,12 physical core &
24
> logical
> >> > > core
> >> > > > that contain 12 nodes(6 nodes on each server). Each node has
8Gb
> RAM
> >> &
> >> > 2
> >> > > > VCPU.
> >> > > > I've set some parameter that get better result like set WAL=off
on
> >> > > put,but
> >> > > > some parameters like Heap-size,Deferred log flush don't help
me.
> >> > > > Beside that I have another question,why each time I've run
> >> > mapreduce,I've
> >> > > > got different result time while all the config & hardware
are
> same &
> >> > not
> >> > > > change ?
> >> > > >
> >> > > > Tnx you guys
> >> > > >
> >> > > > On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <yuzhihong@gmail.com
> >> > <javascript:;>>
> >> > > wrote:
> >> > > >
> >> > > > > Have you read through
> >> http://hbase.apache.org/book.html#performance?
> >> > > > >
> >> > > > > What version of HBase are you using ?
> >> > > > >
> >> > > > > Cheers
> >> > > > >
> >> > > > > On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari <
> >> > > > > mohandes.zebeleh.67@gmail.com <javascript:;>> wrote:
> >> > > > >
> >> > > > > > Hi there
> >> > > > > > I have a cluster with 12 nodes that each of them has
2 core of
> >> CPU.
> >> > > > Now,I
> >> > > > > > want insert large data about 2Gb in 80 sec ( or 6Gb
in 240sec
> ).
> >> > I've
> >> > > > > used
> >> > > > > > Map-Reduce over hbase,but I can't achieve proper result
.
> >> > > > > > I'd be glad if you tell me what I can do to get better
result
> or
> >> > > which
> >> > > > > > parameters should I config or tune to improve Map-Reduce/Hbase
> >> > > > > performance
> >> > > > > > ?
> >> > > > > >
> >> > > > > > Tnx
> >> > > > > >
> >> > > > >
> >> > >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message