hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilm Schumacher <wilm.schumac...@gmail.com>
Subject Re: feature request and question: "BigPut" and "BigGet"
Date Mon, 09 Mar 2015 04:36:01 GMT
Am 09.03.2015 um 05:01 schrieb lars hofhansl:
> Thanks for looking into this Wilm.
> I would honestly suggest just writing larger lobs directly into HDFS and just store the
location in HBase.
> You can do that with a relatively simple protocol, with reasonable safety:1. Write the
metadata row into HBase2. Write the LOB into HDFS3. When the LOB was written, update the metadata
row with the LOBs location.4. Report success back to the client
that would be a client side approach, which of course would work, but
which has some downsides (e.g. being out of sync as you pointed out). On
the other hand ... no large change of core hbase code ;).

But of course by this the small files problem (which i'm facing) is only
solved half way through. If I use your 1MB threshold and let's say a
mean size of 5 MB of one "LOB" and the limitation to ~5M "larger" files
(due to namenode) ... I'm around 2.5 TB raw "LOB data", which isn't that

Or 100 TB for a 10MB threshold and a medium size of 20 MB for LOBs ...
or 200 TB for 10 MB threshold and doubled namenode RAM etc. etc.

By this I can catch the real small stuff. But I'm still bound for "a
little larger MOBs" or "small LOBs".

However, this is still way beyond my current application problems, thus
the problem is more of an academic nature :/.

> If the LOB is small... maybe < 1mb, you'd just write it into HBase as a value (preferably
into a different column family)
> If the process fails at #2 or #3 you'd have an orphaned file in HDFS, but those are easy
to find (metadata rows for which the location is unset, and older than - say - a few days)
I would use a map red on the file names and search in the hbase => if
not found => delete. But yeah, some how in a client fashion.

> Your BigPut and BigGet could just be an API around this process.

As two independent developers gave the same answer i'll drop the idea
and go further on the client way.

Thanks for the fast reply,


View raw message