hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: KeyValue size in bytes compared to store files size
Date Wed, 15 Jan 2014 15:35:35 GMT
See previous discussion: http://search-hadoop.com/m/85S3A1DgZHP1


On Wed, Jan 15, 2014 at 5:44 AM, Amit Sela <amits@infolinks.com> wrote:

> Hi all,
> I'm trying to measure the size (in bytes) of the data I'm about to load
> into HBase.
> I'm using bulk load with PutSortReducer.
> All bulk load data is loaded into new regions and not added to existing
> ones.
>
> In order to count the size of all KeyValues in the Put object I iterate
> over the Put's familyMap.values() and sum the KeyValue lengths.
> After loading the data, I check the region size by summing the
> RegionLoad.getStorefileSizeMB().
> Counting the Put objects size predicted ~500MB per region but in practice I
> got ~32MB per region.
> the table uses GZ compression but this cannot be the cause of such a
> difference.
>
> Is counting the Put's KeyValues the correct way to count a row size ? Is it
> comparable to the store files size ?
>
> Thanks,
> Amit.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message