pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files
Date Tue, 06 Apr 2010 21:25:33 GMT

    [ https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854188#action_12854188

Pradeep Kamath commented on PIG-1299:

Changes are mostly good - a few comments:
1) Instead of creating a wrapper RecordWriter in MapReducePOStoreImpl, the incrementing of
the counter should be done in POStore.getNext() - POStore holds a reference to MapReducePOStoreImpl,
so the counter is available for incrementing. This way, we will still keep our contract to
StoreFunc that the RecordWriter instance provided in prepareToWrite() is the same as the one
given by StoreFunc.getOutputFormat().getRecordWriter(). With this change, the change to BinStorage
should be reverted.
2) Is the check for store.isMultiStore() required in MapReducePOStoreImpl - I think MapReducePOStoreImpl
is used only with multi-store POStore(s) - so the check seems redundant
3) If javac warnings can be addressed, please address them - also unit tests along the lines
of those in TestCounters would be good.

> Implement Pig counter  to track number of output rows for each output files 
> ----------------------------------------------------------------------------
>                 Key: PIG-1299
>                 URL: https://issues.apache.org/jira/browse/PIG-1299
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>         Attachments: PIG-1299.patch
> When running a multi-store query, the Hadoop job tracker often displays only 0 for "Reduce
output records" or "Map output records" counters, This is incorrect and misleading. Pig should
implement an "output records" counter for each output files in the query. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message