hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 张铎 <palomino...@gmail.com>
Subject Re: feature request and question: "BigPut" and "BigGet"
Date Mon, 09 Mar 2015 03:50:22 GMT
If LOB means data larger than 10MB or even 100MB, why not just use an
FileSystem instead of HBase?
For a FileSystem it already has the stream interface...

2015-03-09 10:55 GMT+08:00 Wilm Schumacher <wilm.schumacher@gmail.com>:

> Hi,
>
> I have an idea for a feature in hbase which directly derives from the
> idea of the MOB feature. As Jonathan Hsieh pointed out, the only thing
> that limiting the feature to MOBs instead to LOBs is the memory
> allocation on client and server side. However, the "LOB feature" would
> be very handy for me and I think for some other users, too. Furthermore
> the fast fetching small files problem could be solved.
>
> The natural solution would be a "BigPut" and a "BigGet" class, which
> encounter that problem, which are capable of dealing with large amount
> of data without using too much memory. My plan by now is to creates
> classes that do e.g.
> BigPut BigPut.add( byte[] , byte[] , inputstream )
> and
> outputstream BigResult.value( byte[] , byte[] )
> (in addition to the normal byte[] to byte[] member functions)
>
> and pass the inputstreams through the AsyncProcess class to the RPC or
> in reverse the outputstream for the BigResult class. By this plan the
> client and server would have to throw out some threads to deal with
> multiple streams[1].
>
> By now I dig into the hbase-client (2.0.0) sources and I think that my
> plan would be quite invasive to the existing code ... but is doable.
> However, regarding the very open development model of hbase features I
> think it could be adressed.
>
> But I'm veeeery new to hbase development and just started to read the
> source. Before I dig to deep into the problem I wanted to ask here if
> there is any show stopper I'm missing by now?
> To make a list of questions for that feature:
> * As this plan probably won't break the thread model of the
> hbase-client, is there any problem on the (region) server side? Or is
> there any blocking/race condition problem elsewhere I miss by now?
> * Is it a bad plan to pump several 100s of MB through one RPC in a
> separate thread? If yes ... why?
> * Are there any other fundamental problems I miss by now which makes
> that a horrible plan?
> * Is there already some dev onging? I didn't found something on jira.
> But that doesn't mean anything :/
> * Does anyone have a better name than "BigPut" :D?
>
> And at last:
> * Is it a better plan to create a separate "MOB/LOB service"?[2]
>
> Best wishes
>
> Wilm
>
> [1] or one could limit the number of streams to one. By this the
> threading problem would be much more simple to encounter as only one
> "RPC" would be neccessary.
>
> [2] on one hand it is easier to bare LOBs in mind if you create a
> service e.g. with a rest interface (multipart data etc), on the other
> hand you have to reinvent the wheel (compaction etc.)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message