hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: hbase table size
Date Sat, 07 Apr 2012 20:39:35 GMT
10 -> 100gb sounds about right. Of course it depends on the relative size of the keys and
the values.

HBase needs to store the entire coordinates (rowkey, column identifier, timestamp) for each
KeyValue (i.e. each column), whereas the TSV file only stores the values.

You can try Snappy or LZO compression if (CPU) performance is the primary consideration or
GZ if disk/IO is more important.
Also 0.94+ comes with key prefix compression, which will help a lot in many cases.

-- Lars

 From: mete <efkarr@gmail.com>
To: user@hbase.apache.org 
Sent: Saturday, April 7, 2012 1:21 PM
Subject: hbase table size
Hello folks,

i am trying to import a CSV file that is around 10 gb into HBASE. After the
import, i check the size of the folder with the hadoop fs -du command, and
it is a little above 100 gigabytes in size.
I did not confgure any compression or anything.  I have both tried with
sequential import using the api and creating a Hfile and mounting into
hbase but the size is nearly the same. Does this sound like normal?

Kind Regards.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message