hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Tune MapReduce over HBase to insert data
Date Tue, 08 Jan 2013 08:02:02 GMT
What type of disks and how many?
With the default replication factor your 2 (or 6) GB are actually replicated 3 times.
6GB/80s = 75MB/s, twice that if you do not disable the WAL, which a reasonable machine should
be able to absorb.
The fact that deferred log flush does not help you seems to indicate that you're over IO bound.


What's your memstore flush size? Potentially the data is written many times during compactions.


In your case you dial down the HDFS replication, since you only have two physical machines
anyway.
(Set it to 2. If you do not specify any failure zones, you might as well set it to 1... You
will lose data if one of your server machines dies anyway).

It does not really make that much sense to deploy HBase and HDFS on virtual nodes like this.
-- Lars



________________________________
 From: Farrokh Shahriari <mohandes.zebeleh.67@gmail.com>
To: user@hbase.apache.org 
Sent: Monday, January 7, 2013 9:38 PM
Subject: Re: Tune MapReduce over HBase to insert data
 
Hi again,
I'm using HBase 0.92.1-cdh4.0.0.
I have two server machine with 48Gb RAM,12 physical core & 24 logical core
that contain 12 nodes(6 nodes on each server). Each node has 8Gb RAM & 2
VCPU.
I've set some parameter that get better result like set WAL=off on put,but
some parameters like Heap-size,Deferred log flush don't help me.
Beside that I have another question,why each time I've run mapreduce,I've
got different result time while all the config & hardware are same & not
change ?

Tnx you guys

On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Have you read through http://hbase.apache.org/book.html#performance ?
>
> What version of HBase are you using ?
>
> Cheers
>
> On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari <
> mohandes.zebeleh.67@gmail.com> wrote:
>
> > Hi there
> > I have a cluster with 12 nodes that each of them has 2 core of CPU. Now,I
> > want insert large data about 2Gb in 80 sec ( or 6Gb in 240sec ). I've
> used
> > Map-Reduce over hbase,but I can't achieve proper result .
> > I'd be glad if you tell me what I can do to get better result or which
> > parameters should I config or tune to improve Map-Reduce/Hbase
> performance
> > ?
> >
> > Tnx
> >
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message