hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Helmling <ghelml...@gmail.com>
Subject Re: HBase as a file repository
Date Wed, 05 Apr 2017 17:11:08 GMT
On Tue, Apr 4, 2017 at 11:00 AM Stack <stack@duboce.net> wrote:

> What's the recommended approach to avoid or reduce the delay between when
> > HBase starts sending the response and when the application can act on it?
> As is, Cells are indivisible as are 'responses' when we promise a
> consistent view. Our implementation has us first realize the response in
> memory on the server-side before we ship the client. We do not have support
> for streaming responses (though this an old request that has come up in
> many forms [1]). Until we have such support, there'll be this lag you
> describe whether MOB or not.
As Stack points out, this is the reason you're seeing higher
time-to-first-byte on the client side with larger files.  We don't stream
the response within a cell -- the full cell value is being shipped to the
client in a single response.

To improve this, you could try chunking files larger than some threshold
(1MB?) across multiple columns in the row that is stored on the server
side.  You would need to write an abstraction for this on the client side.
The columns could be named with just a simple incrementing counter, which
will of course give them back to you in the right order:

row:cf:1 -> first 1MB
row:cf:2 -> second 1MB


Then when reading the row back, instead of performing a get, perform a scan
on that single row.  If you call:


with then the server will send back individual responses when max result
size is exceeded, which will allow your client to see the column chunks in
individual calls to ResultScanner.next().  The downside is that you will
have more round trips to the server, so you should also look at total
response time. This may help you to implement a pseudo-streaming interface
back to the client.

You may have to play with the right chunk size and max result size values
to use, but this is the way that I would approach large file storage.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message