hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dru Jensen <drujen...@gmail.com>
Subject Re: newbie - map reduce not distributing
Date Wed, 30 Jul 2008 21:04:58 GMT

thanks for your quick response.   I have 4 mapping processes running  
on 3 systems.

Are the same rows being processed 4 times by each mapping processor?   
According to the logs they are.

When I run a map/reduce against a file, only one row gets logged per  
mapper.  Why would this be different for hbase tables?

I would think only one mapping process would process that one row and  
it would only show up once in only one log.
preferable it would be the same system that has the region.

I only want one row to be processed once.  Is there anyway to change  
this behavior without running only 1 mapper?


On Jul 30, 2008, at 1:44 PM, Jean-Daniel Cryans wrote:

> Dru,
> The regions will split when achieving a certain threshold so if you  
> want
> your computing to be distributed, you will have to have more data.
> Regards,
> J-D
> On Wed, Jul 30, 2008 at 4:36 PM, Dru Jensen <drujensen@gmail.com>  
> wrote:
>> Hello,
>> I created a map/reduce process by extending the TableMap and  
>> TableReduce
>> API but for some reason
>> when I run multiple mappers, in the logs its showing that the same  
>> rows are
>> being processed by each Mapper.
>> When I say logs, I mean in the hadoop task tracker (localhost: 
>> 50030) and
>> drilling down into the logs.
>> Do I need to manually perform a TableSplit or is this supposed to  
>> be done
>> automatically?
>> If its something I need to do manually, can someone point me to  
>> some sample
>> code?
>> If its supposed to be automatic and each mapper was supposed to get  
>> its own
>> set of rows,
>> should I write up a bug for this?  I using trunk 0.2.0 on hadoop  
>> trunk
>> 0.17.2.
>> thanks,
>> Dru

View raw message