hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase as a file repository
Date Tue, 04 Apr 2017 18:23:36 GMT
On Thu, Mar 30, 2017 at 9:25 PM, Daniel Jeliński <djelinski1@gmail.com>
wrote:

> Thank you Ted for your response.
>
> I have read that part of HBase book. It never explained why objects over
> 10MB are no good, and did not suggest an alternative storage medium for
> these.
>
>
Thats a hole. I filed HBASE-17875.

The 10MB upper-bound is a conservative upper bound. Bigger Cells will skirt
buffer pools so we'll do one-off allocations per read. The GC will
experience a shock, and so on.


> I have also read this:
> http://hbase.apache.org/book.html#regionserver_sizing_rules_of_thumb
> And yet I'm trying to put 36TB on a machine. I certainly hope that the
> number of region servers is the only real limiter to this.
>
>
In the refguide, the guidance is intentionally conservative. It is probably
also stale at this point. Most users/devs do not do the degree of PoC'ing
that you have. The recommendations are more for the latter than you.

 ...


> I checked async HBase projects, but apparently they're focused on running
> the requests in background, rather than returning results earlier. HBase
> streaming on Google returns just references to Spark.
>
> HBase JIRA has a few apparently related issues:
> https://issues.apache.org/jira/browse/HBASE-17721 is pretty fresh with no
> development yet, and https://issues.apache.org/jira/browse/HBASE-13467
> seems to have died already.
>
>
I pinged on HBASE-13467. My understanding was that this project was
underway...

St.Ack




> I captured the network traffic between the client and the region server
> when I requested one cell, and writing a custom client seems easy enough.
> Are there any reasons other than the API that justify the 10MB limit on
> MOBs?
> Thanks,
> Daniel
>
>
>
> 2017-03-31 0:03 GMT+02:00 Ted Yu <yuzhihong@gmail.com>:
>
> > Have you read:
> > http://hbase.apache.org/book.html#hbase_mob
> >
> > In particular:
> >
> > When using MOBs, ideally your objects will be between 100KB and 10MB
> >
> > Cheers
> >
> > On Thu, Mar 30, 2017 at 1:01 PM, Daniel Jeliński <djelinski1@gmail.com>
> > wrote:
> >
> > > Hello,
> > > I'm evaluating HBase as a cheaper replacement for NAS as a file storage
> > > medium. To that end I have a cluster of 5 machines, 36TB HDD each; I'm
> > > planning to initially store ~240 million files of size 1KB-100MB, total
> > > size 30TB. Currently I'm storing each file under an individual column,
> > and
> > > I group related documents in the same row. The files from the same row
> > will
> > > be served one at a time, but updated/deleted together.
> > >
> > > Loading the data to the cluster went pretty well; I enabled MOB on the
> > > table and have ~50 regions per machine. Writes to the table are done by
> > an
> > > automated process, and cluster's performance in that area is more than
> > > sufficient. On the other hand, reads are interactive, as the files are
> > > served to human users over HTTP.
> > >
> > > Now. HBase Get in Java API is an atomic operation in the sense that it
> > does
> > > not complete until all data is retrieved from the server. It takes 100
> ms
> > > to retrieve a 1MB cell (file), and only after retrieving I am able to
> > start
> > > serving it to the end user. For larger cells the wait time is even
> > longer,
> > > and response times longer than 100 ms are bad for user experience. I
> > would
> > > like to start streaming the file over HTTP as soon as possible.
> > >
> > > What's the recommended approach to avoid or reduce the delay between
> when
> > > HBase starts sending the response and when the application can act on
> it?
> > > Thanks,
> > > Daniel
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message