hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Padilla <f...@alum.mit.edu>
Subject Re: Map\Reduce for one row
Date Wed, 12 Aug 2009 15:51:38 GMT
I haven't done MR with HBase, but when I do will have a something close 
to what he wants.

I want to do MR where the input is in range between a known startKey and 
endKey.  (His example is an extreme where his range is very small)

Is there an easy way to give a key range to the MR job, so that it 
doesn't have to walk through keys I know I don't want?

On 8/12/09 6:01 AM, Alex Spodinets wrote:
> Ryan, Tim,
> thanks for your response. It is fairly obvious  that scanning through entire
> table is an option. But why would you scan if you know what you're looking
> for. My research brought me to the Split algorithm to be changed for
> TableInputBase (or it's ancestor) and produce only one split for the actual
> location of the row. Do you think this will work ?
> My intention is to explore the other usage of Map\Reduce - not as a
> batch\parallel mass processing system but as a way to run single task and
> track it utilizing it's ability to run code where data is.
> Any thoughts will be highly appreciated.
> On Wed, Aug 12, 2009 at 5:40 AM, Ryan Rawson<ryanobjc@gmail.com>  wrote:
>> You can write map that processes the entire table discarding
>> uninteresting rows, and the scheduler will make a best-effort attempt
>> to scheduling locality. You will want to set up rack awareness to
>> ensure this is as effective as possible.
>> But how big are these rows? Rows that are bigger than the Xmx of a VM
>> don't really work right now (see: 0.21 roadmap). And for isolated
>> queries, locality really doesnt buy you as much as you think it might.
>> Save maybe 0.1ms (ping time on a modern LAN) or less.
>> -ryan
>> On Tue, Aug 11, 2009 at 9:07 AM, Alex Spodinets<spodinets@gmail.com>
>> wrote:
>>> I do know the row. I want MR job to be run on the closest server to where
>>> data is. So this MR job will process only data for this one row.
>>> Thanks,
>>> Alex.
>>> On Tue, Aug 11, 2009 at 6:50 PM, stack<stack@duboce.net>  wrote:
>>>> On Tue, Aug 11, 2009 at 7:35 AM, Alex Spodinets<spodinets@gmail.com>
>>>> wrote:
>>>>> Hello,
>>>>> Is it possible to run a Map\Reduce job for only one row in table? Thus
>>>>> skipping the unnecessary cycling through other rows by ignoring them
>>>>> manually or via "skip mode".
>>>> The idea behind it is to use Map\Reduce more like an application server
>>>> with
>>>>> data location awareness vs batch\parallel processing system.
>>>> Please add more description.  I'm having trouble understanding what you
>> are
>>>> asking.
>>>> + If you know the row you want, just ask hbase -- you don't have to go
>> via
>>>> MR.
>>>> + MR is usually offline/batch operations but when you say things like
>>>> 'application server' I get the sense you are talking about real-time
>>>> lookups?
>>>> Thanks,
>>>> St.Ack

View raw message