hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pasaliczaharije <pasalic.zahar...@gmail.com>
Subject HBase - hiting only one node on insert ...
Date Mon, 18 Jan 2010 11:54:13 GMT


we are having small Hadoop cluster environment with 7 nodes (8GB ram/8cores
each node) + 1 master and on same nodes we deployed HBase (7 nodes).

Currrenlty we are importing ~50milion records from csv files into hbase. csv
can have about 100 columns and rowkey is uuid generated with java.util.UUID.

We are having about 50files on HDFS which is imported into hbase by

At start everything works fine, but after few minutes, we are having large
load on second node. Here is list from hbase master.jsp

hadoop-node01:60030	1263591474251	requests=184, regions=148, usedHeap=1196,
hadoop-node02:60030	1263591474109	requests=663, regions=148, usedHeap=1489,
hadoop-node03:60030	1263591474082	requests=161, regions=147, usedHeap=1526,
hadoop-node04:60030	1263632774794	requests=142, regions=147, usedHeap=1213,
hadoop-node06:60030	1263596977608	requests=152, regions=147, usedHeap=749,
hadoop-node07:60030	1263597118777	requests=156, regions=148, usedHeap=1749,
hadoop-node08:60030	1263597239565	requests=179, regions=148, usedHeap=1681,

(second node having about 5times more requests than other nodes) and at some
time we will have request=0 for all nodes excepts for node2 (where we are
having about 600-1800).

In general we used uuid to have some kind of uniform load for all nodes. I'm
not sure is this some UUID thing (not uniform) or something other.

Also, we are using default hadoop configuration (70nodes will result in 14
maps which runs in parallel). Is this optimal for this kind of job?

Any comments?

View this message in context: http://old.nabble.com/HBase---hiting-only-one-node-on-insert-...-tp27209452p27209452.html
Sent from the HBase User mailing list archive at Nabble.com.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message