hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Hadoop/Hbase 0.94.2 performance what to expect
Date Sat, 27 Oct 2012 20:26:59 GMT
Hi Nick,

are you asking about read or write performance? importTSV writes to HBase. Hive is read only.
Is this Hive on of HBase, or raw HDFS files?

How many disk drives do you boxes have?

-- Lars

 From: nick maillard <nicolas.maillard@fifty-five.com>
To: user@hbase.apache.org 
Sent: Saturday, October 27, 2012 5:05 AM
Subject: Hadoop/Hbase 0.94.2 performance what to expect
Hi everyone

So I've set up a hadoop/hbase/hive 3 ubuntu machines cluster:
master: Ubuntu 64bit, 8 core 3ghz, 16gb mem, gigaethernet connection
2slaves: the same
I went around the different documentations,blogs and articles on hadoop and or
Hbase understanding and tuning. map/reduce tasks 7, up heap param,
xrecievers,compression,speculative exection off etc...
I've installed ycbs to start stress testing as well on my own set of data.

Looking around I saw a lot of experiences and tools to test but to put it simply
put I don'tknow what I should expect. 

When I import through import TSV a 5 gb file it takes about an hour. (my keys
are not incremental)
When I stress test with one thread writing 10million entries it takes a little
over an hour.
When I ask though hive something like 'select * from tableA where valueC=1' on a
table of about 1,5 million elements it takes 4minutes to resolve.Arguably I
should have a rowkey to really get a god time but this example is to test
map/reduce against a dataset.

So all in all what should I expect, is my dataset too small so it seems like a
relatively long time. The writes seem really long and resolving through
map/reduce seems long as well. Off course maybe the time would be the same for a
much larger set which would make a lot more sense.

Just for info I have checked with iostat and my disks are about 95% iddle.

So If someone were kind enough to share what kind of performance I could expect
with my cluster just to see If my set up is really not respondinf how it should
or If I'm using it the wrong way. Or if this is coherent 

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message