hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ronen Itkin <ro...@taykey.com>
Subject Re: HBase write latency issues with hdfs replication of 3
Date Tue, 04 Oct 2011 18:12:48 GMT
Sorry, I sent it by mistake before I have finished writing.
here is the full mail.

*Cluster 1:*
> Back in my office (not on an Amazon instance):
>
>    - 1 server (3GB RAM, 1 CPU unit) installed with a local Hadoop cluster
>    (NameNode,DatNode,JTracker,DataNode), HBase cluster (HMaster,HRegion) and a
>    local Zookeeper - all with default configuration.
>    - HDFS replication parameter = *1*
>    - Benchmark result: *writing 10,000 records in ~1 second*
>
> *Cluster 2:*
> On Amazon EC2 environment
>
>    - 1 server - small instance (1.7GB RAM, 1 CPU unit) installed with a
>    local Hadoop cluster (NameNode,DatNode,JTracker,DataNode), HBase cluster
>    (HMaster,HRegion) and a local Zookeeper - all with default configuration.
>    - HDFS replication parameter = *1*
>    - Benchmark result: *writing 10,000 records in ~8 seconds*
>
> *Cluster 3:*
> On Amazon EC2 environment
>
>    - 1 server - large instance (7.5GB RAM, 4 CPU Units) installed Hadoop
>    NameNode, JotTracker, HBaseMaster, ZooKeeper.
>    - 3 servers - xlarge instances (15GB RAM, 8 CPU Units  - each instance)
>    installed Hadoop with DataNode, TaskTracker, HBaseRegionServer.
>    - 1 server -  large instance (7.5GB RAM, 4 CPU Units) installed Hadoop
>    SecondaryNameNode, HBaseBackupMaster, Zookeeper.
>    - 1 server - small instance (1.7GB RAM, 1 CPU unit) installed with
>    Zookeeper.
>    - First run      - HDFS replication parameter = *1*  -->  Benchmark
>    result: *writing 10,000 records in ~11 seconds*
>    - Second run - HDFS replication parameter = *3*  -->  Benchmark result:
>    *writing 10,000 records in ~58 seconds*
>
>
> *Questions:*
>
>    - Cluster 1 compared to cluster 2 - Why there is a huge difference in
>    performance between Amazon environment and my back office environments ?? as
>    far as I know the difference should be only the I/O, because Amazon's
>    partitions are not really local, and yet the difference is huge!
>    - Cluster 3 - Why does it make such a huge difference if the
>    replication is set to tree? Isnt it supposed to be transparent ? and get the
>    same performance? because the client's application interacts with the name
>    node and waits to receive only one ACK per packet? Does the NameNode sends
>    that ACK after the first successful write or after the whole 3 replications
>    are ready? is that configurable? maybe its the client's application fault?
>
>     Sorry for the long e-mail!

> Thanks alot!!
>
*
> Ronen Itkin*
> Taykey | www.taykey.com
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message