pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5355) Negative progress report by HBaseTableRecordReader
Date Mon, 10 Sep 2018 20:07:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609725#comment-16609725
] 

Koji Noguchi commented on PIG-5355:
-----------------------------------

Thanks for the fix.  Looks good to me.  +1 

Outside of this jira, I still don't like the logic of {{HBaseTableInputFormat.getProgress()}}
{code:java}
if (bigLastRow.compareTo(bigEnd_) > 0) {
  return progressSoFar_;
}
{code}
which means when records have longer key length than {{max(startRow_.length,endRow_.length)}},
progress stays the same.

> Negative progress report by HBaseTableRecordReader
> --------------------------------------------------
>
>                 Key: PIG-5355
>                 URL: https://issues.apache.org/jira/browse/PIG-5355
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Major
>         Attachments: PIG-5355-1.patch, PIG-5355-2.patch, PIG-5355-3.patch
>
>
> The logic for padding the current row does not consider the updated padded row during
the comparison. It ends up with different length then expected. This results in negative value
for {{processed}}.
> {code}
>             byte[] lastPadded = currRow_;
>             if (currRow_.length < endRow_.length) {
>                 lastPadded = Bytes.padTail(currRow_, endRow_.length - currRow_.length);
>             }
>             if (currRow_.length < startRow_.length) {
>                 lastPadded = Bytes.padTail(currRow_, startRow_.length - currRow_.length);
>             }
>             byte [] prependHeader = {1, 0};
>             BigInteger bigLastRow = new BigInteger(Bytes.add(prependHeader, lastPadded));
>             if (bigLastRow.compareTo(bigEnd_) > 0) {
>                 return progressSoFar_;
>             }
>             BigDecimal processed = new BigDecimal(bigLastRow.subtract(bigStart_));
> {code}
> The fix is to use {{lastPadded}} in the second {{if}} comparison and {{Bytes.padTail}}
call inside that {{if}}
> PIG-4700 added progress reporting. This enabled ProgressHelper in Tez. It calls {{getProgress}}
[here |https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/common/ProgressHelper.java#L50]
on {{PigRecrodReader}} https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java#L159
. Since Pig is reporting negative progress, job is getting killed by AM.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message