hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject KeyValue size in bytes compared to store files size
Date Wed, 15 Jan 2014 13:44:38 GMT
Hi all,
I'm trying to measure the size (in bytes) of the data I'm about to load
into HBase.
I'm using bulk load with PutSortReducer.
All bulk load data is loaded into new regions and not added to existing
ones.

In order to count the size of all KeyValues in the Put object I iterate
over the Put's familyMap.values() and sum the KeyValue lengths.
After loading the data, I check the region size by summing the
RegionLoad.getStorefileSizeMB().
Counting the Put objects size predicted ~500MB per region but in practice I
got ~32MB per region.
the table uses GZ compression but this cannot be the cause of such a
difference.

Is counting the Put's KeyValues the correct way to count a row size ? Is it
comparable to the store files size ?

Thanks,
Amit.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message