hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: Zero-copy reads
Date Tue, 27 Jul 2010 17:18:23 GMT
Can you provide more links to comments in jira mentioning "loss of zero copy reads"?

Basically what this is referring to are changes made in the 0.20 release of HBase related
to the block-based HFile format, the KeyValue data pointer, and other stuff like the Result
client return type and the block cache.

Previously (in 0.19 and before), when executing read queries, we would make copies of the
values we were reading into separate byte arrays to return back to the client.  There wasn't
much of a way around this until the introduction of blocks and KeyValue.

Now, once we read in a block from an HFile (which contains a bunch of KeyValues appended to
each other), we don't physically copy the bytes anymore.  Rather, we use KeyValue to point
to the different KVs contained in the block.  Underneath, KeyValue is nothing more than a
byte[], offset, and length (essentially, a pointer into a larger byte[]).

We pass these KeyValues (which really point into larger blocks) all the way back to the client
via the Result data type.

Does that make sense?

As far as I know, nothing has changed this in 0.20 or trunk.


> -----Original Message-----
> From: Andrew Nguyen [mailto:andrew-lists-hbase@ucsfcti.org]
> Sent: Tuesday, July 27, 2010 10:10 AM
> To: hbase-user@hadoop.apache.org
> Subject: Zero-copy reads
> Hello all,
> I recently saw some references to "zero copy reads" in Lars' blog post
> as well as some powerpoints, jira comments, etc.
> Is there any additional information available on this topic?  I saw
> some comments in jira that mentioned the loss of zero copy reads, while
> others mention that it's a feature.
> Thanks!
> --Andrew

View raw message