hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8016) HBase as an embeddable library, but still using HDFS
Date Fri, 08 Mar 2013 06:50:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596899#comment-13596899

eric baldeschwieler commented on HBASE-8016:

Hi Stack

MiniHBaseCluster sounds like a good way to prototype this.  I've been kicking this around
w ddas and serge and they suggested that.  

Hi Matt

Noooo .. that would be an interesting project but is going a very different direction.

Hi Andrew

LevelDB is the other thing I've been thinking about.  We may to some comparison.  But adapting
it to use HDFS efficiently may prove non-trivial and I'd want something that can handle a
couple of TB of data, not clear level DB fits that bill.

In terms of distributed data store...  we definitely suffer from not having good simple mechanisms
to add good state management of large data sets to simple apps around hadoop.  Often they
have a single master and managable data rates.  They are getting built on DBs today, but that
really is crufty.  I'm looking for a repeatable data management design that doesn't bring
all the fun of Admining either a high availablity RDMS or distributed NoSQL store into the

Other approaches might be to hack up derbe or SqlLite or postgres, but all of these bring
more bagage since thet are not already HDFS native.  And none should scale as well as HBase.
> HBase as an embeddable library, but still using HDFS
> ----------------------------------------------------
>                 Key: HBASE-8016
>                 URL: https://issues.apache.org/jira/browse/HBASE-8016
>             Project: HBase
>          Issue Type: Wish
>            Reporter: eric baldeschwieler
> This goes in the "strange idea" bucket...  
> I'm looking for a tool to allow folks to store key-value data into HDFS so that hadoop
companion layers & apps don't need to rely either on external database or a NoSQL store.
 HBase itself is often not running on such clusters and we can not add it as a requirement
for many of the use cases I'm considering.
> But...  what if we produced a library that provided the basic HBase API (creating tables
& putting / getting values...) and this library was pointed at HDFS for durability.  This
library would effectively embed a region server and the the master in a node and provide only
API level access within that JVM.  We would skip marshaling & networking, gaining a fair
amount of efficiency.  An application using this library would gain all of the advantages
of HBase without adding any additional administrative complexity of managing HBase as a distributed
> Thoughts?
> Example use cases...  Right now a typical hadoop install runs serval services that use
databases (Oozie, HCat, Hive ...).  What if some of these could be ported to use HDFS itself
as their store with the HBase API provided to manage their data.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message