Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of michael_segel@hotmail.com
 designates 65.55.111.93 as permitted sender)
Message-ID: <BLU0-SMTP19829218AE108296398E2638F7B0@phx.gbl>
From: Michael Segel <michael_segel@hotmail.com>
Content-Type: multipart/alternative;
	boundary="Apple-Mail=_F24AA8DD-4AF5-48D9-A7E2-478C737A62A4"
MIME-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: Hadoop counter
Date: Sun, 21 Oct 2012 07:15:54 -0500
References: 
 <CAK_MoSthrdONrCmuatnRiNkCPS3FeL5PXONtwV5qjPPiw81Rxw@mail.gmail.com>
 <BLU0-SMTP169457F081D527A60263B578F750@phx.gbl>
 <CAK_MoSs2wQ+Pp7_T69cQ=YV-m2Euqe2wExY-YOXPwnnKEgjV0w@mail.gmail.com>
 <BLU0-SMTP3585687C59419FB3A703C38F750@phx.gbl>
 <CAK_MoStpk+Sj2OhScJhOzVQuLULH2+v4C1LVYkY4j0jxPx5tFg@mail.gmail.com>
 <BLU0-SMTP180856636A7A2207602587A8F750@phx.gbl>
 <CAK_MoSuMX42A=KP=ZE7Tfiqm79QUUxAab9Go7U0Am0Cvu8Gjhg@mail.gmail.com>
 <BLU0-SMTP27883BD87B3CB5B4F3DB3478F740@phx.gbl>
 <CAK_MoStFN6WQRKFsYTUSbhs3XQze0yhV3ETH47fqUwPf_nSdRA@mail.gmail.com>
To: user@hadoop.apache.org
In-Reply-To: 
 <CAK_MoStFN6WQRKFsYTUSbhs3XQze0yhV3ETH47fqUwPf_nSdRA@mail.gmail.com>

--Apple-Mail=_F24AA8DD-4AF5-48D9-A7E2-478C737A62A4
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="iso-8859-1"


On Oct 21, 2012, at 1:45 AM, Lin Ma <linlma@gmail.com> wrote:

> Thanks for the detailed reply, Mike. Yes, my most confusion is =
resolved by you. The last two questions (or comments) are used to =
confirm my understanding is correct,
>=20
> - is it normal use case or best practices for a job to consume/read =
the counters from previous completed job in an automatic way? I ask this =
because I am not sure whether the most use case of counter is human read =
and manual analysis, other then using another job to automatic consume =
the counters?

Lin,=20
Every job has a set of counters to maintain job statistics.=20
This is specifically for human analysis and to help understand what =
happened with your job.=20
It allows you to see how much data is read in by the job, how many =
records processed to be measured against how long the job took to =
complete.  It also showed you how much data is written back out. =20

In addition to this,  a set of use cases for counters in Hadoop center =
on quality control. Its normal to chain jobs together to form a job =
flow.=20
A typical use case for Hadoop is to pull data from various sources, =
combine them and do some process on them, resulting in a data set that =
gets sent to another system for visualization.=20

In this use case, there are usually data cleansing and validation jobs. =
As they run, its possible to track a number of defective records. At the =
end of that specific job, from the ToolRunner, or whichever job class =
you used to launch your job, you can then get these aggregated counters =
for the job and determine if the process passed or failed.  Based on =
this, you can exit your program with either a success or failed flag.  =
Job Flow control tools like Oozie can capture this and then decide to =
continue or to stop and alert an operator of an error.=20

> - I want to confirm my understanding is correct, when each task =
completes, JT will aggregate/update the global counter values from the =
specific counter values updated by the complete task, but never expose =
global counters values until job completes? If it is correct, I am =
wondering why JT doing aggregation each time when a task completes, =
other than doing a one time aggregation when the job completes? Is there =
any design choice reasons? thanks.

That's a good question. I haven't looked at the code, so I can't say =
definitively when the JT performs its aggregation. However, as the job =
runs and in process, we can look at the job tracker web page(s) and see =
the counter summary. This would imply that there has to be some =
aggregation occurring mid-flight. (It would be trivial to sum the list =
of counters periodically to update the job statistics.)  Note too that =
if the JT web pages can show a counter, its possible to then write a =
monitoring tool that can monitor the job while running and then kill the =
job mid flight if a certain threshold of a counter is met.=20

That is to say you could in theory write a monitoring process and watch =
the counters. If lets say an error counter hits a predetermined =
threshold, you could then issue a 'hadoop job -kill <job-id>' command.=20=


>=20
> regards,
> Lin
>=20
> On Sat, Oct 20, 2012 at 3:12 PM, Michael Segel =
<michael_segel@hotmail.com> wrote:
>=20
> On Oct 19, 2012, at 10:27 PM, Lin Ma <linlma@gmail.com> wrote:
>=20
>> Thanks for the detailed reply Mike, I learned a lot from the =
discussion.
>>=20
>> - I just want to confirm with you that, supposing in the same job, =
when a specific task completed (and counter is aggregated in JT after =
the task completed from our discussion?), the other running task in the =
same job cannot get the updated counter value from the previous =
completed task? I am asking this because I am thinking whether I can use =
counter to share a global value between tasks.
>=20
> Yes that is correct.=20
> While I haven't looked at YARN (M/R 2.0) , M/R 1.x doesn't have an =
easy way for a task to query the job tracker. This might have changed in =
YARN
>=20
>> - If so, what is the traditional use case of counter, only use =
counter values after the whole job completes?
>>=20
> Yes the counters are used to provide data at the end of the job...=20
>=20
>> BTW: appreciate if you could share me a few use cases from your =
experience about how counters are used.
>>=20
> Well you have your typical job data like the number of records =
processed, total number of bytes read,  bytes written...=20
>=20
> But suppose you wanted to do some quality control on your input.=20
> So you need to keep a track on the count of bad records.  If this job =
is part of a process, you may want to include business logic in your job =
to halt the job flow if X% of the records contain bad data.=20
>=20
> Or your process takes input records and in processing them, they sort =
the records based on some characteristic and you want to count those =
sorted records as you processed them.=20
>=20
> For a more concrete example, the Illinois Tollway has these 'fast =
pass' lanes where cars equipped with RFID tags can have the tolls =
automatically deducted from their accounts rather than pay the toll =
manually each time.=20
>=20
> Suppose we wanted to determine how many cars in the 'Fast Pass' lanes =
are cheaters where they drive through the sensor and the sensor doesn't =
capture the RFID tag. (Note its possible that you have a false positive =
where the car has an RFID chip but doesn't trip the sensor.) Pushing the =
data in a map/reduce job would require the use of counters.
>=20
> Does that help?=20
>=20
> -Mike
>=20
>> regards,
>> Lin
>>=20
>> On Sat, Oct 20, 2012 at 5:05 AM, Michael Segel =
<michael_segel@hotmail.com> wrote:
>> Yeah, sorry...=20
>>=20
>> I meant that if you were dynamically creating a counter foo in the =
Mapper task, then each mapper would be creating their own counter foo.=20=

>> As the job runs, these counters will eventually be sent up to the JT. =
The job tracker would keep a separate counter for each task.=20
>>=20
>> At the end, the final count is aggregated from the list of counters =
for foo.=20
>>=20
>>=20
>> I don't know how you can get a task to ask information from the Job =
Tracker on how things are going in other tasks.  That is what I meant =
that you couldn't get information about the other counters or even the =
status of the other tasks running in the same job.=20
>>=20
>> I didn't see anything in the APIs that allowed for that type of =
flow... Of course having said that... someone pops up with a way to do =
just that. ;-)=20
>>=20
>>=20
>> Does that clarify things?=20
>>=20
>> -Mike
>>=20
>>=20
>> On Oct 19, 2012, at 11:56 AM, Lin Ma <linlma@gmail.com> wrote:
>>=20
>>> Hi Mike,
>>>=20
>>> Sorry I am a bit lost... As you are thinking faster than me. :-P
>>>=20
>>> =46rom your this statement "It would make sense that the JT =
maintains a unique counter for each task until the tasks complete." -- =
it seems each task cannot see counters from each other, since JT =
maintains a unique counter for each tasks;
>>>=20
>>> =46rom your this comment "I meant that if a Task created and updated =
a counter, a different Task has access to that counter. " -- it seems =
different tasks could share/access the same counter.
>>>=20
>>> Appreciate if you could help to clarify a bit.
>>>=20
>>> regards,
>>> Lin
>>>=20
>>> On Sat, Oct 20, 2012 at 12:42 AM, Michael Segel =
<michael_segel@hotmail.com> wrote:
>>>=20
>>> On Oct 19, 2012, at 11:27 AM, Lin Ma <linlma@gmail.com> wrote:
>>>=20
>>>> Hi Mike,
>>>>=20
>>>> Thanks for the detailed reply. Two quick questions/comments,
>>>>=20
>>>> 1. For "task", you mean a specific mapper instance, or a specific =
reducer instance?
>>>=20
>>> Either.=20
>>>=20
>>>> 2. "However, I do not believe that a separate Task could connect =
with the JT and see if the counter exists or if it could get a value or =
even an accurate value since the updates are asynchronous." -- do you =
mean if a mapper is updating custom counter ABC, and another mapper is =
updating the same customer counter ABC, their counter values are updated =
independently by different mappers, and will not published (aggregated) =
externally until job completed successfully?
>>>>=20
>>> I meant that if a Task created and updated a counter, a different =
Task has access to that counter.=20
>>>=20
>>> To give you an example, if I want to count the number of quality =
errors and then fail after X number of errors, I can't use Global =
counters to do this.
>>>=20
>>>> regards,
>>>> Lin
>>>>=20
>>>> On Fri, Oct 19, 2012 at 10:35 PM, Michael Segel =
<michael_segel@hotmail.com> wrote:
>>>> As I understand it... each Task has its own counters and are =
independently updated. As they report back to the JT, they update the =
counter(s)' status.
>>>> The JT then will aggregate them.=20
>>>>=20
>>>> In terms of performance, Counters take up some memory in the JT so =
while its OK to use them, if you abuse them, you can run in to issues.=20=

>>>> As to limits... I guess that will depend on the amount of memory on =
the JT machine, the size of the cluster (Number of TT) and the number of =
counters.=20
>>>>=20
>>>> In terms of global accessibility... Maybe.
>>>>=20
>>>> The reason I say maybe is that I'm not sure by what you mean by =
globally accessible.=20
>>>> If a task creates and implements a dynamic counter... I know that =
it will eventually be reflected in the JT. However, I do not believe =
that a separate Task could connect with the JT and see if the counter =
exists or if it could get a value or even an accurate value since the =
updates are asynchronous.  Not to mention that I don't believe that the =
counters are aggregated until the job ends. It would make sense that the =
JT maintains a unique counter for each task until the tasks complete. =
(If a task fails, it would have to delete the counters so that when the =
task is restarted the correct count is maintained. )  Note, I haven't =
looked at the source code so I am probably wrong.=20
>>>>=20
>>>> HTH
>>>> Mike
>>>> On Oct 19, 2012, at 5:50 AM, Lin Ma <linlma@gmail.com> wrote:
>>>>=20
>>>>> Hi guys,
>>>>>=20
>>>>> I have some quick questions regarding to Hadoop counter,
>>>>>=20
>>>>> Hadoop counter (customer defined) is global accessible (for both =
read and write) for all Mappers and Reducers in a job?
>>>>> What is the performance and best practices of using Hadoop =
counters? I am not sure if using Hadoop counters too heavy, there will =
be performance downgrade to the whole job?
>>>>> regards,
>>>>> Lin
>>>>=20
>>>>=20
>>>=20
>>>=20
>>=20
>>=20
>=20
>=20


--Apple-Mail=_F24AA8DD-4AF5-48D9-A7E2-478C737A62A4
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="iso-8859-1"

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Diso-8859-1"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><br><div><div>On Oct 21, 2012, at 1:45 AM, Lin Ma &lt;<a =
href=3D"mailto:linlma@gmail.com">linlma@gmail.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite">Thanks for the detailed reply, Mike. Yes, my most =
confusion is resolved by you. The last two questions (or comments) are =
used to confirm my understanding is correct,<br><br>- is it normal use =
case or best practices for a job to consume/read the counters from =
previous completed job in an automatic way? I ask this because I am not =
sure whether the most use case of counter is human read and manual =
analysis, other then using another job to automatic consume the =
counters?<br></blockquote><div><br></div>Lin,&nbsp;</div><div>Every job =
has a set of counters to maintain job statistics.&nbsp;</div><div>This =
is specifically for human analysis and to help understand what happened =
with your job.&nbsp;</div><div>It allows you to see how much data is =
read in by the job, how many records processed to be measured against =
how long the job took to complete. &nbsp;It also showed you how much =
data is written back out. &nbsp;</div><div><br></div><div>In addition to =
this, &nbsp;a set of use cases for counters in Hadoop center on quality =
control. Its normal to chain jobs together to form a job =
flow.&nbsp;</div><div>A typical use case for Hadoop is to pull data from =
various sources, combine them and do some process on them, resulting in =
a data set that gets sent to another system for =
visualization.&nbsp;</div><div><br></div><div>In this use case, there =
are usually data cleansing and validation jobs. As they run, its =
possible to track a number of defective records. At the end of that =
specific job, from the ToolRunner, or whichever job class you used to =
launch your job, you can then get these aggregated counters for the job =
and determine if the process passed or failed. &nbsp;Based on this, you =
can exit your program with either a success or failed flag. &nbsp;Job =
Flow control tools like Oozie can capture this and then decide to =
continue or to stop and alert an operator of an =
error.&nbsp;</div><div><br><blockquote type=3D"cite">
- I want to confirm my understanding is correct, when each task =
completes, JT will aggregate/update the global counter values from the =
specific counter values updated by the complete task, but never expose =
global counters values until job completes? If it is correct, I am =
wondering why JT doing aggregation each time when a task completes, =
other than doing a one time aggregation when the job completes? Is there =
any design choice reasons? thanks.<br></blockquote><div><br></div>That's =
a good question. I haven't looked at the code, so I can't say =
definitively when the JT performs its aggregation. However, as the job =
runs and in process, we can look at the job tracker web page(s) and see =
the counter summary. This would imply that there has to be some =
aggregation occurring mid-flight. (It would be trivial to sum the list =
of counters periodically to update the job statistics.) &nbsp;Note too =
that if the JT web pages can show a counter, its possible to then write =
a monitoring tool that can monitor the job while running and then kill =
the job mid flight if a certain threshold of a counter is =
met.&nbsp;</div><div><br></div><div>That is to say you could in theory =
write a monitoring process and watch the counters. If lets say an error =
counter hits a predetermined threshold, you could then issue a 'hadoop =
job -kill &lt;job-id&gt;' command.&nbsp;</div><div><br><blockquote =
type=3D"cite">
<br>regards,<br>Lin<br><br><div class=3D"gmail_quote">On Sat, Oct 20, =
2012 at 3:12 PM, Michael Segel <span dir=3D"ltr">&lt;<span =
style=3D"font-family:times new roman,serif"><a =
href=3D"mailto:michael_segel@hotmail.com" =
target=3D"_blank">michael_segel@hotmail.com</a></span>&gt;</span> =
wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"word-wrap:break-word"><br><div><div class=3D"im"><div>On Oct =
19, 2012, at 10:27 PM, Lin Ma &lt;<a href=3D"mailto:linlma@gmail.com" =
target=3D"_blank">linlma@gmail.com</a>&gt; wrote:</div>
<br><blockquote type=3D"cite">Thanks for the detailed reply Mike, I =
learned a lot from the discussion.<br><br>- I just want to confirm with =
you that, supposing in the same job, when a specific task completed (and =
counter is aggregated in JT after the task completed from our =
discussion?), the other running task in the same job cannot get the =
updated counter value from the previous completed task? I am asking this =
because I am thinking whether I can use counter to share a global value =
between tasks.<br>
</blockquote><div><br></div></div>Yes that is =
correct.&nbsp;</div><div>While I haven't looked at YARN (M/R 2.0) , M/R =
1.x doesn't have an easy way for a task to query the job tracker. This =
might have changed in YARN</div>
<div><div class=3D"im"><br><blockquote type=3D"cite">
- If so, what is the traditional use case of counter, only use counter =
values after the whole job completes?<br><br></blockquote></div>Yes the =
counters are used to provide data at the end of the job...&nbsp;<div =
class=3D"im"><br>
<blockquote type=3D"cite">BTW: appreciate if you could share me a few =
use cases from your experience about how counters are used.<br>
<br></blockquote></div>Well you have your typical job data like the =
number of records processed, total number of bytes read, &nbsp;bytes =
written...&nbsp;</div><div><br></div><div>But suppose you wanted to do =
some quality control on your input.&nbsp;</div>
<div>So you need to keep a track on the count of bad records. &nbsp;If =
this job is part of a process, you may want to include business logic in =
your job to halt the job flow if X% of the records contain bad =
data.&nbsp;</div><div><br>
</div><div>Or your process takes input records and in processing them, =
they sort the records based on some characteristic and you want to count =
those sorted records as you processed =
them.&nbsp;</div><div><br></div><div>For a more concrete example, the =
Illinois Tollway has these 'fast pass' lanes where cars equipped with =
RFID tags can have the tolls automatically deducted from their accounts =
rather than pay the toll manually each time.&nbsp;</div>
<div><br></div><div>Suppose we wanted to determine how many cars in the =
'Fast Pass' lanes are cheaters where they drive through the sensor and =
the sensor doesn't capture the RFID tag. (Note its possible that you =
have a false positive where the car has an RFID chip but doesn't trip =
the sensor.) Pushing the data in a map/reduce job would require the use =
of counters.</div>
<div><br></div><div>Does that =
help?&nbsp;</div><div><br></div><div>-Mike</div><div><div =
class=3D"h5"><div><br><blockquote =
type=3D"cite">regards,<br>Lin<br><br><div class=3D"gmail_quote">On Sat, =
Oct 20, 2012 at 5:05 AM, Michael Segel <span dir=3D"ltr">&lt;<a =
href=3D"mailto:michael_segel@hotmail.com" =
target=3D"_blank">michael_segel@hotmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"word-wrap:break-word">Yeah, sorry...&nbsp;<div><br></div><div>I =
meant that if you were dynamically creating a counter foo in the Mapper =
task, then each mapper would be creating their own counter =
foo.&nbsp;</div>

<div>As the job runs, these counters will eventually be sent up to the =
JT. The job tracker would keep a separate counter for each =
task.&nbsp;</div><div><br></div><div>At the end, the final count is =
aggregated from the list of counters for foo.&nbsp;</div>

<div><br></div><div><br></div><div>I don't know how you can get a task =
to ask information from the Job Tracker on how things are going in other =
tasks. &nbsp;That is what I meant that you couldn't get information =
about the other counters or even the status of the other tasks running =
in the same job.&nbsp;</div>

<div><br></div><div>I didn't see anything in the APIs that allowed for =
that type of flow... Of course having said that... someone pops up with =
a way to do just that. =
;-)&nbsp;</div><div><br></div><div><br></div><div>Does that clarify =
things?&nbsp;</div>

<div><br></div><div>-Mike</div><div><div><br></div><div><br><div><div>On =
Oct 19, 2012, at 11:56 AM, Lin Ma &lt;<a href=3D"mailto:linlma@gmail.com" =
target=3D"_blank">linlma@gmail.com</a>&gt; wrote:</div><br><blockquote =
type=3D"cite">

Hi Mike,<br><br>Sorry I am a bit lost... As you are thinking faster than =
me. :-P<br><br>=46rom your this statement "It would make sense that the =
JT maintains a unique counter for each task until the tasks complete." =
-- it seems each task cannot see counters from each other, since JT =
maintains a unique counter for each tasks;<br>


<br>=46rom your this comment "I meant that if a Task created and updated =
a counter, a different Task has access to that counter. " -- it seems =
different tasks could share/access the same counter.<br><br>Appreciate =
if you could help to clarify a bit.<br>


<br>regards,<br>Lin<br><br><div class=3D"gmail_quote">On Sat, Oct 20, =
2012 at 12:42 AM, Michael Segel <span dir=3D"ltr">&lt;<a =
href=3D"mailto:michael_segel@hotmail.com" =
target=3D"_blank">michael_segel@hotmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"word-wrap:break-word"><br><div><div><div>On Oct 19, 2012, at =
11:27 AM, Lin Ma &lt;<a href=3D"mailto:linlma@gmail.com" =
target=3D"_blank">linlma@gmail.com</a>&gt; wrote:</div>


<br><blockquote type=3D"cite">Hi Mike,<br><br>Thanks for the detailed =
reply. Two quick questions/comments,<br><br>1. For "task", you mean a =
specific mapper instance, or a specific reducer =
instance?<br></blockquote>


<div><br></div></div>Either.&nbsp;</div><div><div><br><blockquote =
type=3D"cite">2. "However, I do not believe that a separate Task could =
connect with the JT
 and see if the counter exists or if it could get a value or even an=20
accurate value since the updates are asynchronous." -- do you mean if a =
mapper is updating custom counter ABC, and another mapper is updating =
the same customer counter ABC, their counter values are updated =
independently by different mappers, and will not published (aggregated) =
externally until job completed successfully?<br>


<br></blockquote></div>I meant that if a Task created and updated a =
counter, a different Task has access to that =
counter.&nbsp;</div><div><br></div><div>To give you an example, if I =
want to count the number of quality errors and then fail after X number =
of errors, I can't use Global counters to do this.</div>


<div><div><br><blockquote type=3D"cite">regards,<br>Lin<br><br><div =
class=3D"gmail_quote">On Fri, Oct 19, 2012 at 10:35 PM, Michael Segel =
<span dir=3D"ltr">&lt;<a href=3D"mailto:michael_segel@hotmail.com" =
target=3D"_blank">michael_segel@hotmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"word-wrap:break-word">As I understand it... each Task has its =
own counters and are independently updated. As they report back to the =
JT, they update the counter(s)' status.<div>


The JT then will aggregate them.&nbsp;</div><div><br></div><div>In terms =
of performance, Counters take up some memory in the JT so while its OK =
to use them, if you abuse them, you can run in to =
issues.&nbsp;</div><div>As to limits... I guess that will depend on the =
amount of memory on the JT machine, the size of the cluster (Number of =
TT) and the number of counters.&nbsp;</div>


<div><br></div><div>In terms of global accessibility... =
Maybe.</div><div><br></div><div>The reason I say maybe is that I'm not =
sure by what you mean by globally accessible.&nbsp;</div><div>If a task =
creates and implements a dynamic counter... I know that it will =
eventually be reflected in the JT. However, I do not believe that a =
separate Task could connect with the JT and see if the counter exists or =
if it could get a value or even an accurate value since the updates are =
asynchronous. &nbsp;Not to mention that I don't believe that the =
counters are aggregated until the job ends. It would make sense that the =
JT maintains a unique counter for each task until the tasks complete. =
(If a task fails, it would have to delete the counters so that when the =
task is restarted the correct count is maintained. ) &nbsp;Note, I =
haven't looked at the source code so I am probably wrong.&nbsp;</div>


<div><br></div><div>HTH</div><div>Mike</div><div><div><div>On Oct 19, =
2012, at 5:50 AM, Lin Ma &lt;<a href=3D"mailto:linlma@gmail.com" =
target=3D"_blank">linlma@gmail.com</a>&gt; wrote:</div><br><blockquote =
type=3D"cite">
Hi guys,<br><br>I have some quick questions regarding to Hadoop =
counter,<br><br><ul><li>Hadoop counter (customer defined) is global =
accessible (for both read and write) for all Mappers and Reducers in a =
job?</li><li>What is the performance and best practices of using Hadoop =
counters? I am not sure if using Hadoop counters too heavy, there will =
be performance downgrade to the whole job?<br>


</li></ul>regards,<br>Lin<br>
</blockquote></div><br></div></div></blockquote></div><br>
</blockquote></div><br></div></div></blockquote></div><br>
</blockquote></div><br></div></div></div></blockquote></div><br>
</blockquote></div><br></div></div></div></blockquote></div><br>
</blockquote></div><br></body></html>=

--Apple-Mail=_F24AA8DD-4AF5-48D9-A7E2-478C737A62A4--