hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Duxbury <br...@rapleaf.com>
Subject Re: Evaluating HBase
Date Tue, 05 Feb 2008 00:01:01 GMT
You have it exactly right. There's nothing more to it than that. Is  
there something further you have questions about?

-Bryan

On Feb 4, 2008, at 3:32 PM, Charles Kaminski wrote:

> Hi Bryan,
>
> Thanks for the thoughtful response.  Could you take a
> moment to write a few lines at a high level on how you
> would leverage Hadoop and HBase to fit this use case?
>
> I think I’m reading the following in your response:
> 1. Build and maintain the large master table in HBase
> 2. Use TableInputFormat to convert HBase data into a
> raw format for Hadoop on HDF
> 3. Run Map Reduce in Hadoop
> 4. Use TableOutputFormat to build the new table
>
> Do I have that right?
>
>
> --- Bryan Duxbury <bryan@rapleaf.com> wrote:
>
>> This seems like a good fit for HBase in general.
>> You're right, it's
>> an application for a MapReduce-style processing.
>> HBase doesn't need
>> MapReduce in the sense that HBase is not built
>> dependent upon it.
>> However, we are interested in making HBase play well
>> with MapReduce,
>> and have several handy classes (TableInputFormat,
>> TableOutputFormat)
>> in HBase for doing that with Hadoop's MapReduce.
>>
>> In the current version of HBase, you're correct,
>> there is no way to
>> guarantee that you are mapping over local data. Data
>> locality is
>> something that we are very interested in, but
>> haven't really had the
>> time to pursue yet. We're more concerned about the
>> general
>> reliability and scalability of HBase. We also need
>> to have HDFS, the
>> underlying distributed file system, support
>> locality-awareness, which
>> is something it hasn't gotten completely down yet.
>>
>> I think you should probably give HBase a shot and
>> see how it goes.
>> We're very, very interested in seeing how HBase
>> performs under
>> massive loads and datasets.
>>
>> -Bryan
>>
>> On Feb 4, 2008, at 2:44 PM, Charles Kaminski wrote:
>>
>>> Hi All,
>>>
>>> I am evaluating HBase and I am not sure if our
>>> use-case fits naturally with HBase’s capabilities.
>>  I
>>> would appreciate any help.
>>>
>>> We would like to store a large number (billions)
>> of
>>> rows in HBase using a key field to access the
>> values.
>>> We will then need to continually add, update, and
>>> delete rows.  This is our master table.  What I
>>> describe here naturally fits into what HBase is
>>> designed to do.
>>>
>>> It’s this next part that I’m having trouble
>> finding
>>> documentation for.
>>>
>>> We would like to use HBase’s parallel processing
>>> capabilities to periodically spawn off other
>> temporary
>>> tables when requested.  We would like to take the
>>> first table (the master table), go through the key
>> and
>>> field values in its rows.  From this, we would
>> like to
>>> create a second table organized differently from
>> the
>>> master table.  We would also need to include
>> count,
>>> max, min, and other things specific to the
>> particular
>>> request.
>>>
>>> This seems like textbook map-reduce functionality,
>> but
>>> I don’t see too much in HBase referencing this
>> kind of
>>> setup.  Also there is a reference in HBase’s 10
>> minute
>>> startup guide that states “[HBase doesn’t] need
>>> mapreduce”.
>>>
>>> I suppose we could use HBase as an input and
>> output to
>>> Hadoop's map reduce functionality.  If we did
>> that,
>>> what would guarantee that we were mapping to local
>>> data?
>>>
>>> Any help would be greatly appreciated.  If you
>> have a
>>> reference to a previous discussion or document I
>> could
>>> read, that would be appreciated as well.
>>>
>>> -FA
>
>
>        
> ______________________________________________________________________ 
> ______________
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile.  Try it now.  http:// 
> mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>


Mime
View raw message