pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4757) Job stats on successfully read/output records wrong with multiple inputs/outputs
Date Tue, 05 Jan 2016 20:27:39 GMT

    [ https://issues.apache.org/jira/browse/PIG-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083717#comment-15083717
] 

Daniel Dai commented on PIG-4757:
---------------------------------

+1

> Job stats on successfully read/output records wrong with multiple inputs/outputs
> --------------------------------------------------------------------------------
>
>                 Key: PIG-4757
>                 URL: https://issues.apache.org/jira/browse/PIG-4757
>             Project: Pig
>          Issue Type: Bug
>          Components: tez
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4757-1.patch, PIG-4757-2.patch
>
>
> TezVertexStats uses TaskCounter.INPUT_RECORDS_PROCESSED to display records read from
MRInput. But in cases of replicate join or scalar it also includes replicate join input. 
Need to have a pig specific counter (MULTI_INPUTS_RECORD_COUNTER) in POSimpleTezLoad.
> TezVertexStats uses TaskCounter.OUTPUT_RECORDS to display records stored to MROutput
if there is single store. If there are multiple stores it uses MULTI_STORE_RECORD_COUNTER
and there are no issues. If there is a single store with another output, then value from OUTPUT_RECORDS
is wrong. Need to use MULTI_STORE_RECORD_COUNTER for all cases even if there is no multiple
store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message