hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Duxbury <br...@rapleaf.com>
Subject Re: Evaluating HBase
Date Mon, 04 Feb 2008 22:56:05 GMT
This seems like a good fit for HBase in general. You're right, it's  
an application for a MapReduce-style processing. HBase doesn't need  
MapReduce in the sense that HBase is not built dependent upon it.  
However, we are interested in making HBase play well with MapReduce,  
and have several handy classes (TableInputFormat, TableOutputFormat)  
in HBase for doing that with Hadoop's MapReduce.

In the current version of HBase, you're correct, there is no way to  
guarantee that you are mapping over local data. Data locality is  
something that we are very interested in, but haven't really had the  
time to pursue yet. We're more concerned about the general  
reliability and scalability of HBase. We also need to have HDFS, the  
underlying distributed file system, support locality-awareness, which  
is something it hasn't gotten completely down yet.

I think you should probably give HBase a shot and see how it goes.  
We're very, very interested in seeing how HBase performs under  
massive loads and datasets.


On Feb 4, 2008, at 2:44 PM, Charles Kaminski wrote:

> Hi All,
> I am evaluating HBase and I am not sure if our
> use-case fits naturally with HBase’s capabilities.  I
> would appreciate any help.
> We would like to store a large number (billions) of
> rows in HBase using a key field to access the values.
> We will then need to continually add, update, and
> delete rows.  This is our master table.  What I
> describe here naturally fits into what HBase is
> designed to do.
> It’s this next part that I’m having trouble finding
> documentation for.
> We would like to use HBase’s parallel processing
> capabilities to periodically spawn off other temporary
> tables when requested.  We would like to take the
> first table (the master table), go through the key and
> field values in its rows.  From this, we would like to
> create a second table organized differently from the
> master table.  We would also need to include count,
> max, min, and other things specific to the particular
> request.
> This seems like textbook map-reduce functionality, but
> I don’t see too much in HBase referencing this kind of
> setup.  Also there is a reference in HBase’s 10 minute
> startup guide that states “[HBase doesn’t] need
> mapreduce”.
> I suppose we could use HBase as an input and output to
> Hadoop's map reduce functionality.  If we did that,
> what would guarantee that we were mapping to local
> data?
> Any help would be greatly appreciated.  If you have a
> reference to a previous discussion or document I could
> read, that would be appreciated as well.
> -FA
> ______________________________________________________________________ 
> ______________
> Never miss a thing.  Make Yahoo your home page.
> http://www.yahoo.com/r/hs

View raw message