hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zheyi rong <zheyi.r...@gmail.com>
Subject Re: System.out.printlin vs Counters
Date Wed, 27 Mar 2013 10:38:04 GMT

Depends on your need. If you would like an overall statistics, for example,
the number of the malformed records in your datasets,
use counters. If you just want to know what is going on inside a mapper or
reducer, use System.out.println;
since mappers do not know each other, you cannot get an overall statistics
of your job by using System.out.println().
The output of  System.out.println() will finally appear in the tasklog.

In a distributed environment, mappers do not know each other. Imagine that
mapper A is running on a machine, and mapper B is running on another
machine, so in mapper A, you cannot get the internal state of mapper B
simply by System.out.println().

Harsh J answered it.


2013/3/27 Sai Sai <saigraph@yahoo.in>

> Q1. Is it right to assume the System.out.println statements are used only
> in eclipse environment and
> In a multi node cluster environment we need to use counters.
> Q2. I am slightly confused as it appears like using System.out.println
> statements
> we r able to get detailed info at every line of code in eclipse and
> counters just give few lines and not as detailed as System.out.println
> statements do so what should we do in a multi node cluster enivronment.
> Q3. Also when they say the limit of counters is 120 does that mean that in
> the output if we use:
> context.getCounters("TestGroup1","TestName1").increment(1);
> more than 120 times it will not print it. or does it refer to 120 options
> of counters in an enum that we can define.
> Any help is really appreciated.
> Thanks
> Sai

View raw message