hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatsuya Kawano <tatsuya6...@gmail.com>
Subject Re: Scalability problem with HBase
Date Mon, 24 Jan 2011 04:36:47 GMT
Hi, 

Your 4th and 5th region servers are probably getting no region to serve. Check the number
of regions assigned  to and number of requests processed by theose servers (by Ganglia or
the HBase status screen)

HBase won't rebalance regions when you add servers. It does assign regions to the new server
only when the existing regions on other region server get split. Your workload only updates
existing records and the existing regions won't get split very often. 

I don't know a good way to work with this, so I hope others on this list to have better idea.
But you can manually split a region via HBase shell, and also disabling / enabling a region
may reassign the region to 4th or 5th server. 

Hope this helps, 
Tatsuya

--
Tatsuya Kawano (Mr.)
Tokyo, Japan


On Jan 24, 2011, at 6:41 AM, Thibault Dory <dory.thibault@gmail.com> wrote:

> Hello,
> 
> I'm currently testing the performances of HBase for a specific test case. I
> have downloaded ~20000 articles from Wikipedia and I want to test the
> performances of read/writes and MapReduce.
> I'm using HBase 0.20.6 and Hadoop 0.20.2 on a cluster of Ubuntu powered
> servers connected with Gigabit ethernet.
> 
> My test works like this :
> - I start with 3 physical server, used like this : 3 hadoop nodes (1
> namenode and 3 datanode) and for HBase : 1 master and 3 regionserver.
> - I insert all the articles, with one article by row that contains to cells
> : ID and article
> - I start 3 threads from another machine, reading and updating (I simply
> append the string "1" to the end of the article) articles randomly and I
> measure the time needed for all the operations to finish
> - I build a reverse search index using two phases of MapReduce and measure
> the time to compute it
> - then I add a new server on wich I start a datanode and a region server
> and I start the benchmark again with 4 thread
> - I repeat those steps until I reach the last available server (8 in total)
> 
> I am keeping the total number of operations as a constant and appending "1"
> to an article does not change much it's size.
> 
> The problem is the kind of results I'm seeing, I believed that the time
> needed to perform the read/writes operations would decrease as I add new
> servers to the cluster but I'm experiencing exactly the opposite. Moreover,
> the more request I make, the slower the cluster become, for a constant size.
> For example here are the results in seconds that I have on my cluster just
> after the first insertion with 3 nodes for 10000 operations ( 20% of wich
> are updating the articles) :
> 
> Individual times : [30.338116567, 24.402751402, 25.650858953, 27.699796324,
> 26.589869283, 33.909433157, 52.538378122, 48.0114018, 47.149348721,
> 42.825791078]
> Then one minute after this runs ends, everything else staying the same :
> Individual times : [58.181552147, 48.931155328, 62.509309199, 57.198395723,
> 63.267397201, 54.160937835, 57.635454167, 64.780292628, 62.762390414,
> 61.381563914]
> And finaly five minutes after the last run ends, everything else staying the
> same :
> Individual times : [56.852388792, 58.011768345, 63.578745601, 68.008043323,
> 79.545419247, 87.183962628, 88.1989561, 94.532923849, 99.852569437,
> 102.355709259]
> 
> It seems quite clear that the time needed to perform the same amount of
> operations is rising fast.
> 
> When I add server to the cluster, the time needed to perform the operations
> keeps rising. Here are the results for 4 servers using the same methodology
> as above :
> Immediately after the new server is added
> Individual times : [86.224951713, 80.777746425, 84.814954717, 93.07842057,
> 83.348558502, 90.037499401, 106.799544002, 98.122952552, 97.057614119,
> 94.277285461]
> One minute after last test
> Individual times : [94.633454698, 101.250176482, 99.945406887,
> 101.754011832, 106.882328108, 97.808320021, 97.050036703, 95.844557847,
> 97.931572694, 92.258327247]
> Five minute after last test
> Individual times : [98.188162512, 96.332809905, 93.598184149, 93.552745204,
> 96.905860067, 102.149408296, 101.545412423, 105.377292242, 108.855117219,
> 110.429000567]
> 
> The times needed to compute the inverse search index using MapReduce are
> rising too :
> 3 nodes
> Results : [106.604148815, 104.829340323, 101.986450167, 102.871575842,
> 102.177574017]
> 4 nodes
> Results : [120.451610507, 115.007344179, 115.075212636, 115.146883431,
> 114.216465299]
> 5 nodes
> Results : [139.563445944, 132.933993434, 134.117730658, 132.927127084,
> 132.041046308]
> 
> 
> I don't think that this behaviour is normal, I should see the time needed to
> complete the same amount of work decreasing as I add more servers in the
> cluster. Unless this is because my cluster is too small?
> I should say that all the servers in the cluster seem to use an equal amount
> of CPU while the test is running, so it looks like all of them are working
> and there is no server that is not storing data.
> 
> What do you think? Where did I screw up to see that kind of results with
> HBase?

Mime
View raw message