Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 185ED1755F for ; Mon, 9 Mar 2015 04:02:45 +0000 (UTC) Received: (qmail 17317 invoked by uid 500); 9 Mar 2015 04:02:44 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 17235 invoked by uid 500); 9 Mar 2015 04:02:44 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 17223 invoked by uid 99); 9 Mar 2015 04:02:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Mar 2015 04:02:43 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lhofhansl@yahoo.com designates 72.30.239.137 as permitted sender) Received: from [72.30.239.137] (HELO nm32-vm1.bullet.mail.bf1.yahoo.com) (72.30.239.137) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Mar 2015 04:02:15 +0000 Received: from [66.196.81.171] by nm32.bullet.mail.bf1.yahoo.com with NNFMP; 09 Mar 2015 04:01:10 -0000 Received: from [98.139.212.239] by tm17.bullet.mail.bf1.yahoo.com with NNFMP; 09 Mar 2015 04:01:10 -0000 Received: from [127.0.0.1] by omp1048.mail.bf1.yahoo.com with NNFMP; 09 Mar 2015 04:01:10 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 318277.57300.bm@omp1048.mail.bf1.yahoo.com X-YMail-OSG: CxNmYaoVM1kRmN.UeGaDWU0eOV35eckI9ACAEt_Z6BtxZ7BAuPoJngu7kXObP6R CIGtaIt5n.t1TuHkcDZxt9im9oFi4.AeRMxunbferBbq9igPMr1LUWSo3vR6.bsey7LKoJR5nMf8 Qo7TDuEHVMkEVBDzXfzRaGBFGwvrXC29FXpGCg7ygnf1BTHB7xswmRohXoF4esa7XNXnp.VrnFte x0CbevpqB9mZ0uGhY8CrT2jm5TYskzWWZ_QWFDbWtOM2RYWZLclHclJdBdPp_A3xsJlrlj9.9fcw MeuwdjMRzOzZbd8QVc45M3.Ay80krMlS2rarwTr_AFutzV8cKzezUN2z3jseN7AuLVA02VRlLrqE kDLQP6Il5iwBEFAN3aJyRzFL3ukyODtoqfkSHgrGuviXu8bsYNCrcjowfovhm6W2srckwkOJlyUV eQQ9hjQsNxP2SgqYQotlZz4FFCO4ZQtp6a5PgBRxSnA59RsV12SqC9DnV7ojhiBjqPYY.419KrIN iVmyj Received: by 76.13.27.196; Mon, 09 Mar 2015 04:01:09 +0000 Date: Mon, 9 Mar 2015 04:01:09 +0000 (UTC) From: lars hofhansl Reply-To: lars hofhansl To: "dev@hbase.apache.org" Message-ID: <1893530215.1110076.1425873669235.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: <54FD0BAB.8050704@gmail.com> References: <54FD0BAB.8050704@gmail.com> Subject: Re: feature request and question: "BigPut" and "BigGet" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1110075_1115414612.1425873669229" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_1110075_1115414612.1425873669229 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Thanks for looking into this Wilm. I would honestly suggest just writing larger lobs directly into HDFS and just store the location in HBase. You can do that with a relatively simple protocol, with reasonable safety:1. Write the metadata row into HBase2. Write the LOB into HDFS3. When the LOB was written, update the metadata row with the LOBs location.4. Report success back to the client If the LOB is small... maybe < 1mb, you'd just write it into HBase as a value (preferably into a different column family) If the process fails at #2 or #3 you'd have an orphaned file in HDFS, but those are easy to find (metadata rows for which the location is unset, and older than - say - a few days) Your BigPut and BigGet could just be an API around this process. -- Lars From: Wilm Schumacher To: dev@hbase.apache.org Sent: Sunday, March 8, 2015 7:55 PM Subject: feature request and question: "BigPut" and "BigGet" Hi, I have an idea for a feature in hbase which directly derives from the idea of the MOB feature. As Jonathan Hsieh pointed out, the only thing that limiting the feature to MOBs instead to LOBs is the memory allocation on client and server side. However, the "LOB feature" would be very handy for me and I think for some other users, too. Furthermore the fast fetching small files problem could be solved. The natural solution would be a "BigPut" and a "BigGet" class, which encounter that problem, which are capable of dealing with large amount of data without using too much memory. My plan by now is to creates classes that do e.g. BigPut BigPut.add( byte[] , byte[] , inputstream ) and outputstream BigResult.value( byte[] , byte[] ) (in addition to the normal byte[] to byte[] member functions) and pass the inputstreams through the AsyncProcess class to the RPC or in reverse the outputstream for the BigResult class. By this plan the client and server would have to throw out some threads to deal with multiple streams[1]. By now I dig into the hbase-client (2.0.0) sources and I think that my plan would be quite invasive to the existing code ... but is doable. However, regarding the very open development model of hbase features I think it could be adressed. But I'm veeeery new to hbase development and just started to read the source. Before I dig to deep into the problem I wanted to ask here if there is any show stopper I'm missing by now? To make a list of questions for that feature: * As this plan probably won't break the thread model of the hbase-client, is there any problem on the (region) server side? Or is there any blocking/race condition problem elsewhere I miss by now? * Is it a bad plan to pump several 100s of MB through one RPC in a separate thread? If yes ... why? * Are there any other fundamental problems I miss by now which makes that a horrible plan? * Is there already some dev onging? I didn't found something on jira. But that doesn't mean anything :/ * Does anyone have a better name than "BigPut" :D? And at last: * Is it a better plan to create a separate "MOB/LOB service"?[2] Best wishes Wilm [1] or one could limit the number of streams to one. By this the threading problem would be much more simple to encounter as only one "RPC" would be neccessary. [2] on one hand it is easier to bare LOBs in mind if you create a service e.g. with a rest interface (multipart data etc), on the other hand you have to reinvent the wheel (compaction etc.) ------=_Part_1110075_1115414612.1425873669229--