hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Hadoop counter
Date Fri, 19 Oct 2012 16:19:35 GMT
Hi,

Inline.

On Fri, Oct 19, 2012 at 9:39 PM, Lin Ma <linlma@gmail.com> wrote:
> Hi Harsh,
>
> Thanks for the great reply. Two basic questions,
>
> - Where the counters' value are stored for successful job? On JT?

Yes, they are ultimately stored at JT until the job is retired out of
heap memory (in which case, they get stored into the JobHistory
location and format).

> - Supposing a specific job A completed successfully and updated related
> counters, is it possible for another specific job B to read counters updated
> by previous job A? If yes, how?

Yes, possible, use the RunningJob object from the previous job (or
capture one) and query it. APIs you're interested in:

Grab a query-able object (RunningJob and/or a Job):
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/JobClient.html#getJob(org.apache.hadoop.mapred.JobID)
or http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Cluster.html#getJob(org.apache.hadoop.mapreduce.JobID)

Query counters:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getCounters()
or http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters()

> regards,
> Lin
>
>
> On Fri, Oct 19, 2012 at 11:50 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>> Bejoy is almost right, except that counters are reported upon progress
>> of tasks itself (via TT heartbeats to JT actually), but the final
>> counter representation is computed only with successful task reports
>> the job received, not from any failed or killed ones.
>>
>> On Fri, Oct 19, 2012 at 8:51 PM, Bejoy KS <bejoy.hadoop@gmail.com> wrote:
>> > Hi Jay
>> >
>> > Counters are reported at the end of a task to JT. So if a task fails the
>> > counters from that task are not send to JT and hence won't be included
>> > in
>> > the final value of counters from that Job.
>> > Regards
>> > Bejoy KS
>> >
>> > Sent from handheld, please excuse typos.
>> > ________________________________
>> > From: Jay Vyas <jayunit100@gmail.com>
>> > Date: Fri, 19 Oct 2012 10:18:42 -0500
>> > To: <user@hadoop.apache.org>
>> > ReplyTo: user@hadoop.apache.org
>> > Subject: Re: Hadoop counter
>> >
>> > Ah this answers alot about why some of my dynamic counters never show up
>> > and
>> > i have to bite my nails waiting to see whats going on until the end of
>> > the
>> > job- thanks.
>> >
>> > Another question: what happens if a task fails ?  What happen to the
>> > counters for it ?  Do they dissappear into the ether? Or do they get
>> > merged
>> > in with the counters from other tasks?
>> >
>> > On Fri, Oct 19, 2012 at 9:50 AM, Bertrand Dechoux <dechouxb@gmail.com>
>> > wrote:
>> >>
>> >> And by default the number of counters is limited to 120 with the
>> >> mapreduce.job.counters.limit property.
>> >> They are useful for displaying short statistics about a job but should
>> >> not
>> >> be used for results (imho).
>> >> I know people may misuse them but I haven't tried so I wouldn't be able
>> >> to
>> >> list the caveats.
>> >>
>> >> Regards
>> >>
>> >> Bertrand
>> >>
>> >>
>> >> On Fri, Oct 19, 2012 at 4:35 PM, Michael Segel
>> >> <michael_segel@hotmail.com>
>> >> wrote:
>> >>>
>> >>> As I understand it... each Task has its own counters and are
>> >>> independently updated. As they report back to the JT, they update the
>> >>> counter(s)' status.
>> >>> The JT then will aggregate them.
>> >>>
>> >>> In terms of performance, Counters take up some memory in the JT so
>> >>> while
>> >>> its OK to use them, if you abuse them, you can run in to issues.
>> >>> As to limits... I guess that will depend on the amount of memory on
>> >>> the
>> >>> JT machine, the size of the cluster (Number of TT) and the number of
>> >>> counters.
>> >>>
>> >>> In terms of global accessibility... Maybe.
>> >>>
>> >>> The reason I say maybe is that I'm not sure by what you mean by
>> >>> globally
>> >>> accessible.
>> >>> If a task creates and implements a dynamic counter... I know that it
>> >>> will
>> >>> eventually be reflected in the JT. However, I do not believe that a
>> >>> separate
>> >>> Task could connect with the JT and see if the counter exists or if it
>> >>> could
>> >>> get a value or even an accurate value since the updates are
>> >>> asynchronous.
>> >>> Not to mention that I don't believe that the counters are aggregated
>> >>> until
>> >>> the job ends. It would make sense that the JT maintains a unique
>> >>> counter for
>> >>> each task until the tasks complete. (If a task fails, it would have
to
>> >>> delete the counters so that when the task is restarted the correct
>> >>> count is
>> >>> maintained. )  Note, I haven't looked at the source code so I am
>> >>> probably
>> >>> wrong.
>> >>>
>> >>> HTH
>> >>> Mike
>> >>> On Oct 19, 2012, at 5:50 AM, Lin Ma <linlma@gmail.com> wrote:
>> >>>
>> >>> Hi guys,
>> >>>
>> >>> I have some quick questions regarding to Hadoop counter,
>> >>>
>> >>> Hadoop counter (customer defined) is global accessible (for both read
>> >>> and
>> >>> write) for all Mappers and Reducers in a job?
>> >>> What is the performance and best practices of using Hadoop counters?
I
>> >>> am
>> >>> not sure if using Hadoop counters too heavy, there will be performance
>> >>> downgrade to the whole job?
>> >>>
>> >>> regards,
>> >>> Lin
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Bertrand Dechoux
>> >
>> >
>> >
>> >
>> > --
>> > Jay Vyas
>> > http://jayunit100.blogspot.com
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Mime
View raw message