tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Pivovarov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (TEZ-1344) Combiner counters reported by Tez look wrong
Date Sat, 04 Oct 2014 01:49:33 GMT

    [ https://issues.apache.org/jira/browse/TEZ-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158858#comment-14158858
] 

Alexander Pivovarov edited comment on TEZ-1344 at 10/4/14 1:49 AM:
-------------------------------------------------------------------

MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always
return Counters: 0. 
{code}
hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar  wordcount -D mapreduce.framework.name=yarn-tez
in out
14/10/03 18:22:59 INFO mapreduce.Job:  map 100% reduce 100%
14/10/03 18:22:59 INFO mapreduce.Job: Job job_1412382361327_0008 completed successfully
14/10/03 18:22:59 INFO mapreduce.Job: Counters: 0
{code}

Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff suggested returns
{code}
$ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out
...
	org.apache.tez.common.counters.TaskCounter
		REDUCE_INPUT_GROUPS=35518
		REDUCE_INPUT_RECORDS=284742
		COMBINE_INPUT_RECORDS=0
{code}

comments in org.apache.tez.common.counters.TaskCounter code says
{code}
 COMBINE_OUTPUT_RECORDS, // Not used at the moment.
{code}

I notieced that [~cheolsoo] mentioned class
org.apache.hadoop.mapreduce.TaskCounter   (defined in hadoop jars)

but tez api programm returns counters from different class  (defined in tez jars)
org.apache.tez.common.counters.TaskCounter

I'm confused.
How and what shoud I run by tez to get hadoop but not tez TaskCounters?
org.apache.hadoop.mapreduce.TaskCounter
COMBINE_OUTPUT_RECORDS
COMBINE_INPUT_RECORDS



was (Author: apivovarov):
MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always
return Counters: 0. 
{code}
hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar  wordcount -D mapreduce.framework.name=yarn-tez
in out
14/10/03 18:22:59 INFO mapreduce.Job:  map 100% reduce 100%
14/10/03 18:22:59 INFO mapreduce.Job: Job job_1412382361327_0008 completed successfully
14/10/03 18:22:59 INFO mapreduce.Job: Counters: 0
{code}

Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff suggested returns
{code}
$ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out
...
	org.apache.tez.common.counters.TaskCounter
		REDUCE_INPUT_GROUPS=35518
		REDUCE_INPUT_RECORDS=284742
		COMBINE_INPUT_RECORDS=0
{code}

comments in org.apache.tez.common.counters.TaskCounter code says
{code}
 COMBINE_OUTPUT_RECORDS, // Not used at the moment.
{code}

I notieced that [~cheolsoo] mentioned class
org.apache.hadoop.mapreduce.TaskCounter   (defined in hadoop jars)

but tez api programm returns counters from different class  (defined in tez jars)
org.apache.tez.common.counters.TaskCounter

I'm confused.
How and what shoud I run by tez to get org.apache.hadoop.mapreduce.TaskCounter  COMBINE_OUTPUT_RECORDS
and COMBINE_INPUT_RECORDS  counters?


> Combiner counters reported by Tez look wrong
> --------------------------------------------
>
>                 Key: TEZ-1344
>                 URL: https://issues.apache.org/jira/browse/TEZ-1344
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Cheolsoo Park
>            Priority: Minor
>
> Combiner input/output counters reported by a Tez job seems wrong
> {code}
> org.apache.hadoop.mapreduce.TaskCounter:
> COMBINE_OUTPUT_RECORDS 35,977,263,353
> COMBINE_INPUT_RECORDS 1,000,529,333
> {code}
> As can be seen, combiner output records > input records?!
> The same counters from a MR job looks as follows-
> {code}
> Map-Reduce Framework:
> Combine output records 1,000,316,600
> Combine input records 35,977,049,632
> {code}
> Somehow input and output are swapped?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message