hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Stream interface to cell Data? Was -> Re: Avoiding OutOfMemory Java heap space in region servers
Date Thu, 19 Aug 2010 05:20:52 GMT
Hey,

Streaming is one of those kinds of things that would require a major
wholesale change... good ones, but needless to say reworking the
fundamentals of how the RPC system and the storage system and the file
format works is not really an overnight project.

If you are storing extremely large cells the best bet is HDFS.  Most
systems end up having to do mixed storage, and it might be difficult
to make HBase useful for 10 byte cells and 10 GB cells.  With some
good API layers on your app side it shouldn't be too hard.

-ryan

On Wed, Aug 18, 2010 at 9:02 PM, Stack <stack@duboce.net> wrote:
> On Wed, Aug 18, 2010 at 4:47 PM, Stuart Smith <stu24mail@yahoo.com> wrote:
>>
>> Hello,
>>
>>  I was wondering if there are any plans for a stream interface to Cell data. I saw
this:
>>
>>> > or they are using large client write buffers so big
>>> payloads are being
>>> > passed to the server in each RPC request.  Our
>>> RPC is not streaming.
>>
>> So I'm guessing there's not one now (and I couldn't find one in 0.20.6 either). HDFS
does seem to provide a stream interface (I'm about to try it out).
>>
>> So is there a fundamental limitation on hbase that prevents a streaming
>> interface to Cells, is it possible but distasteful for some reason, or is it just
a TODO item?
>>
>
>
>
> Our RPC doesn't do streaming.
>
> A streaming/chunking protocol would be nice -- there is even an old
> issue to do it -- but I think general consensus is that its low
> priority (do you think different)?
>
> Also, if your cells are large, you might consider keeping the content
> in hdfs and their location up in hbase.  If the cell is 100MB, the
> lookup in hbase pales beside the time to stream from hdfs.
>
>
>> I'm thinking this could help alleviate the Big Cell OOME situation. This would be
especially handy if you just have a few outlier cells that are really big, and lots of smaller
ones.
>>
>
> Big cell OOME is rare, unless I'm mistaken.  Or saying it another way,
> its rare in my experience that hbase is used hosting big cells.  We
> should add better cell size checks out on client and like the
> speed-limiter on your hertz ferrari, it'll keep you safe at least
> until you go out of your way to dismantle the check.
>
>> Right now I'm just going with the solution of putting a layer on top of my system
that writes filemetadata and most (smaller) files to hbase, and the occasional big file to
HDFS. This should work, and is probably best in the long run, but a streaming interface would
be handy!
>>
>
> Oh, yeah, this is a bit of a pain having to handle two sources for
> data.  Does your dataset fluctuate wildly in its size?   Is there a
> way you can separate the big from the small?  If so, perhaps you could
> model it so the big was in one column family and the small in another.
>  The big column family held the hdfs location where the small-data
> column family actually carried the data?
>
> St.Ack
>

Mime
View raw message