hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nanheng Wu <nanhen...@gmail.com>
Subject Bulk load questions
Date Mon, 27 Dec 2010 09:54:01 GMT
I am running some tests to load data from HDFS into HBase in a MR job.
I am pretty new to HBase and I have some questions regarding bulk load
performance: I have a small cluster with 4 nodes, I set up one node to
run Namenode/JobTracker/ZK, and the other three nodes all run
TaskTracker/DataNode/HRegion. During my test I am seeing about 1300
inserts per second total and it feels kind of slow. My rows are pretty
small ~250 bytes. I am wondering if it is a good idea to be running MR
on all nodes. Would it be better if I run MR load job on separate
nodes? Also I observe that one task tracker's CPU usage was twice as
high as the other two. I can't figure out why that is, does that
indicate some hot spots in the cluster? I'd really appreciate some
ideas, and please let me know if my description is not specific or
detailed enough and what other information I can provide to help
diagnose the problem. Thanks!

Mime
View raw message