hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Kaminski <freea...@yahoo.com>
Subject Re: Evaluating HBase
Date Mon, 04 Feb 2008 23:32:12 GMT
Hi Bryan,

Thanks for the thoughtful response.  Could you take a
moment to write a few lines at a high level on how you
would leverage Hadoop and HBase to fit this use case?

I think I’m reading the following in your response:
1. Build and maintain the large master table in HBase
2. Use TableInputFormat to convert HBase data into a
raw format for Hadoop on HDF
3. Run Map Reduce in Hadoop
4. Use TableOutputFormat to build the new table

Do I have that right?


--- Bryan Duxbury <bryan@rapleaf.com> wrote:

> This seems like a good fit for HBase in general.
> You're right, it's  
> an application for a MapReduce-style processing.
> HBase doesn't need  
> MapReduce in the sense that HBase is not built
> dependent upon it.  
> However, we are interested in making HBase play well
> with MapReduce,  
> and have several handy classes (TableInputFormat,
> TableOutputFormat)  
> in HBase for doing that with Hadoop's MapReduce.
> 
> In the current version of HBase, you're correct,
> there is no way to  
> guarantee that you are mapping over local data. Data
> locality is  
> something that we are very interested in, but
> haven't really had the  
> time to pursue yet. We're more concerned about the
> general  
> reliability and scalability of HBase. We also need
> to have HDFS, the  
> underlying distributed file system, support
> locality-awareness, which  
> is something it hasn't gotten completely down yet.
> 
> I think you should probably give HBase a shot and
> see how it goes.  
> We're very, very interested in seeing how HBase
> performs under  
> massive loads and datasets.
> 
> -Bryan
> 
> On Feb 4, 2008, at 2:44 PM, Charles Kaminski wrote:
> 
> > Hi All,
> >
> > I am evaluating HBase and I am not sure if our
> > use-case fits naturally with HBase’s capabilities.
>  I
> > would appreciate any help.
> >
> > We would like to store a large number (billions)
> of
> > rows in HBase using a key field to access the
> values.
> > We will then need to continually add, update, and
> > delete rows.  This is our master table.  What I
> > describe here naturally fits into what HBase is
> > designed to do.
> >
> > It’s this next part that I’m having trouble
> finding
> > documentation for.
> >
> > We would like to use HBase’s parallel processing
> > capabilities to periodically spawn off other
> temporary
> > tables when requested.  We would like to take the
> > first table (the master table), go through the key
> and
> > field values in its rows.  From this, we would
> like to
> > create a second table organized differently from
> the
> > master table.  We would also need to include
> count,
> > max, min, and other things specific to the
> particular
> > request.
> >
> > This seems like textbook map-reduce functionality,
> but
> > I don’t see too much in HBase referencing this
> kind of
> > setup.  Also there is a reference in HBase’s 10
> minute
> > startup guide that states “[HBase doesn’t] need
> > mapreduce”.
> >
> > I suppose we could use HBase as an input and
> output to
> > Hadoop's map reduce functionality.  If we did
> that,
> > what would guarantee that we were mapping to local
> > data?
> >
> > Any help would be greatly appreciated.  If you
> have a
> > reference to a previous discussion or document I
> could
> > read, that would be appreciated as well.
> >
> > -FA


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



Mime
View raw message