hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop Sam John <anoo...@huawei.com>
Subject RE: Usage of block encoding in bulk loading
Date Sun, 13 May 2012 18:50:57 GMT
Thanks Stack for your reply. I will work on this and give a patch soon...

From: saint.ack@gmail.com [saint.ack@gmail.com] on behalf of Stack [stack@duboce.net]
Sent: Saturday, May 12, 2012 10:08 AM
To: dev@hbase.apache.org
Subject: Re: Usage of block encoding in bulk loading

On Fri, May 11, 2012 at 10:18 AM, Anoop Sam John <anoopsj@huawei.com> wrote:
> Hi Devs
>              When the data is bulk loaded using HFileOutputFormat, we are not using the
block encoding and the HBase handled checksum features I think..  When the writer is created
for making the HFile, I am not seeing any such info passing to the WriterBuilder.
> In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont have these
info and do not pass also to the writer... So those HFiles will not have these optimizations..
> Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide one HFile(created
by the MR) iff it can not belong to just one region, I can see we pass the datablock encoding
details and checksum details to the new HFile writer. But this step wont happen normally I
> Correct me if my understanding is wrong pls...

Sounds plausible Anoop.  Sounds like something worth fixing too?

Good on you,

View raw message