hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Hadoop counter
Date Fri, 19 Oct 2012 16:42:32 GMT

On Oct 19, 2012, at 11:27 AM, Lin Ma <linlma@gmail.com> wrote:

> Hi Mike,
> 
> Thanks for the detailed reply. Two quick questions/comments,
> 
> 1. For "task", you mean a specific mapper instance, or a specific reducer instance?

Either. 

> 2. "However, I do not believe that a separate Task could connect with the JT and see
if the counter exists or if it could get a value or even an accurate value since the updates
are asynchronous." -- do you mean if a mapper is updating custom counter ABC, and another
mapper is updating the same customer counter ABC, their counter values are updated independently
by different mappers, and will not published (aggregated) externally until job completed successfully?
> 
I meant that if a Task created and updated a counter, a different Task has access to that
counter. 

To give you an example, if I want to count the number of quality errors and then fail after
X number of errors, I can't use Global counters to do this.

> regards,
> Lin
> 
> On Fri, Oct 19, 2012 at 10:35 PM, Michael Segel <michael_segel@hotmail.com> wrote:
> As I understand it... each Task has its own counters and are independently updated. As
they report back to the JT, they update the counter(s)' status.
> The JT then will aggregate them. 
> 
> In terms of performance, Counters take up some memory in the JT so while its OK to use
them, if you abuse them, you can run in to issues. 
> As to limits... I guess that will depend on the amount of memory on the JT machine, the
size of the cluster (Number of TT) and the number of counters. 
> 
> In terms of global accessibility... Maybe.
> 
> The reason I say maybe is that I'm not sure by what you mean by globally accessible.

> If a task creates and implements a dynamic counter... I know that it will eventually
be reflected in the JT. However, I do not believe that a separate Task could connect with
the JT and see if the counter exists or if it could get a value or even an accurate value
since the updates are asynchronous.  Not to mention that I don't believe that the counters
are aggregated until the job ends. It would make sense that the JT maintains a unique counter
for each task until the tasks complete. (If a task fails, it would have to delete the counters
so that when the task is restarted the correct count is maintained. )  Note, I haven't looked
at the source code so I am probably wrong. 
> 
> HTH
> Mike
> On Oct 19, 2012, at 5:50 AM, Lin Ma <linlma@gmail.com> wrote:
> 
>> Hi guys,
>> 
>> I have some quick questions regarding to Hadoop counter,
>> 
>> Hadoop counter (customer defined) is global accessible (for both read and write)
for all Mappers and Reducers in a job?
>> What is the performance and best practices of using Hadoop counters? I am not sure
if using Hadoop counters too heavy, there will be performance downgrade to the whole job?
>> regards,
>> Lin
> 
> 


Mime
View raw message