hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Understanding HBase random reads
Date Tue, 05 Jul 2016 05:10:09 GMT
On Mon, Jul 4, 2016 at 6:49 AM, Robert James <srobertjames@gmail.com> wrote:

> I'd like to understand HBase block reads better.  Assume my HBase
> block is 64KB and my HDFS block is 64MB.
>
> I've read that HBase can just do a random read of the 64KB block,
> without reading the 64MB HDFS block.



Thats right.



> Given that HDFS doesn't support
> random reads within a block, how is that possible?



It does support reading at an explicit offset. See [1] and the pread method
that follows.



> Or does HBase somehow short circuit and go directly to OS, bypassing
> HDFS because it knows HDFS internals?
>
>
There is also a 'short circuit' read facility, yes, that makes the read
less costly if the block is local [2].



> Depending on the above: Aside from HBase block compression, should I
> use HDFS block compression? If HDFS compression prevents HBase from
> doing a random read, I most certainly do _not_ want to use it.  But if
> HBase can't do a random read to HDFS, then I want to use HDFS block
> compression, because you can compress a 64 MB block much better than a
> 64 KB block.
>

I've not played with it but my guess is that HDFS compression would be
transparent to HBase but that the cost of seek to a particular offset would
require our decompressing all of the HDFS block up to the particular read
point.

You could enable hbase compression; the HBase blocks will be compressed.

Regards 'much better' compression, which compressor are you thinking off?
When I looked last, a long time ago admittedly, the likes of gzip worked on
chunks considerably smaller than an HDFS block.

Thanks,
St.Ack


1.
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-hdfs/2.7.1/org/apache/hadoop/hdfs/DFSInputStream.java#DFSInputStream.read%28long%2Cbyte%5B%5D%2Cint%2Cint%29
2.
https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message