pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5355) Negative progress report by HBaseTableRecordReader
Date Wed, 19 Sep 2018 14:29:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620669#comment-16620669
] 

Koji Noguchi commented on PIG-5355:
-----------------------------------

{quote}
Outside of this jira, I still don't like the logic of {{HBaseTableInputFormat.getProgress()}}
{code:java}
if (bigLastRow.compareTo(bigEnd_) > 0) {
  return progressSoFar_;
}
{code}
which means when records have longer key length than {{max(startRow_.length,endRow_.length)}},
progress stays the same.
{quote}
[~satishsaley], [~rohini], how about we truncate (by calling Bytes.head) when  
{{maxRowLength < currRow_.length}} ?

Or, I'm fine committing as is.  Most important of the patch is avoiding the negative progress
report.

> Negative progress report by HBaseTableRecordReader
> --------------------------------------------------
>
>                 Key: PIG-5355
>                 URL: https://issues.apache.org/jira/browse/PIG-5355
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Major
>         Attachments: PIG-5355-1.patch, PIG-5355-2.patch, PIG-5355-3.patch
>
>
> The logic for padding the current row does not consider the updated padded row during
the comparison. It ends up with different length then expected. This results in negative value
for {{processed}}.
> {code}
>             byte[] lastPadded = currRow_;
>             if (currRow_.length < endRow_.length) {
>                 lastPadded = Bytes.padTail(currRow_, endRow_.length - currRow_.length);
>             }
>             if (currRow_.length < startRow_.length) {
>                 lastPadded = Bytes.padTail(currRow_, startRow_.length - currRow_.length);
>             }
>             byte [] prependHeader = {1, 0};
>             BigInteger bigLastRow = new BigInteger(Bytes.add(prependHeader, lastPadded));
>             if (bigLastRow.compareTo(bigEnd_) > 0) {
>                 return progressSoFar_;
>             }
>             BigDecimal processed = new BigDecimal(bigLastRow.subtract(bigStart_));
> {code}
> The fix is to use {{lastPadded}} in the second {{if}} comparison and {{Bytes.padTail}}
call inside that {{if}}
> PIG-4700 added progress reporting. This enabled ProgressHelper in Tez. It calls {{getProgress}}
[here |https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/common/ProgressHelper.java#L50]
on {{PigRecrodReader}} https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java#L159
. Since Pig is reporting negative progress, job is getting killed by AM.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message