hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mridul Muralidharan <mrid...@yahoo-inc.com>
Subject Re: setNumReduceTasks(1)
Date Tue, 26 Jan 2010 07:08:58 GMT
Jeff Zhang wrote:
> *See my comments below*
> 
> On Mon, Jan 25, 2010 at 3:22 PM, Something Something <
> mailinglists19@gmail.com> wrote:
> 
>> If I set # of reduce tasks to 1 using setNumReduceTasks(1), would the class
>> be instantiated only on one machine.. always?  I mean if I have a cluster
>> of
>> say 1 master, 10 workers & 3 zookeepers, is the Reducer class guaranteed to
>> be instantiated only on 1 machine?
>>
>> *--Yes*
> 
> 
>> If answer is yes, then I will use static variable as a counter to see how
>> may rows have been added to my HBase table so far.  In my use case, I want
>> to write only N number of rows to a table.  Is there a better way to do
>> this?  Please let me know.  Thanks.
>>
> 
> *--Maybe you can use Counter to track the number of rows you add to HBase,
> then you do not need to limit the reduce task as 1*
> 
> 

Counter's are not synchronized in 'real-time' : so you cant use that to 
limit at addition time imo.
It is more for aggregation, not realtime messaging.

- Mridul

Mime
View raw message