hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilm Schumacher <wilm.schumac...@gmail.com>
Subject feature request and question: "BigPut" and "BigGet"
Date Mon, 09 Mar 2015 02:55:39 GMT

I have an idea for a feature in hbase which directly derives from the
idea of the MOB feature. As Jonathan Hsieh pointed out, the only thing
that limiting the feature to MOBs instead to LOBs is the memory
allocation on client and server side. However, the "LOB feature" would
be very handy for me and I think for some other users, too. Furthermore
the fast fetching small files problem could be solved.

The natural solution would be a "BigPut" and a "BigGet" class, which
encounter that problem, which are capable of dealing with large amount
of data without using too much memory. My plan by now is to creates
classes that do e.g.
BigPut BigPut.add( byte[] , byte[] , inputstream )
outputstream BigResult.value( byte[] , byte[] )
(in addition to the normal byte[] to byte[] member functions)

and pass the inputstreams through the AsyncProcess class to the RPC or
in reverse the outputstream for the BigResult class. By this plan the
client and server would have to throw out some threads to deal with
multiple streams[1].

By now I dig into the hbase-client (2.0.0) sources and I think that my
plan would be quite invasive to the existing code ... but is doable.
However, regarding the very open development model of hbase features I
think it could be adressed.

But I'm veeeery new to hbase development and just started to read the
source. Before I dig to deep into the problem I wanted to ask here if
there is any show stopper I'm missing by now?
To make a list of questions for that feature:
* As this plan probably won't break the thread model of the
hbase-client, is there any problem on the (region) server side? Or is
there any blocking/race condition problem elsewhere I miss by now?
* Is it a bad plan to pump several 100s of MB through one RPC in a
separate thread? If yes ... why?
* Are there any other fundamental problems I miss by now which makes
that a horrible plan?
* Is there already some dev onging? I didn't found something on jira.
But that doesn't mean anything :/
* Does anyone have a better name than "BigPut" :D?

And at last:
* Is it a better plan to create a separate "MOB/LOB service"?[2]

Best wishes


[1] or one could limit the number of streams to one. By this the
threading problem would be much more simple to encounter as only one
"RPC" would be neccessary.

[2] on one hand it is easier to bare LOBs in mind if you create a
service e.g. with a rest interface (multipart data etc), on the other
hand you have to reinvent the wheel (compaction etc.)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message