hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-773) LineRecordReader can report non-zero progress while it is processing a compressed stream
Date Tue, 01 Sep 2009 23:27:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750179#action_12750179
] 

Chris Douglas commented on MAPREDUCE-773:
-----------------------------------------

This change does not preserve the existing behavior:
{noformat}
-                            Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),
-                                     maxLineLength));
+                            Math.max(maxBytesToConsume(), maxLineLength));
{noformat}

{noformat}
+  private int maxBytesToConsume() {
+    return (isCompressedInput()) ? Integer.MAX_VALUE
+                           : (int) Math.min(Integer.MAX_VALUE, (end - start));
+  }
{noformat}

Instead of {{end - pos}}, this uses {{end - start}} if less than maxint. This is a regression
in HADOOP-3144

> LineRecordReader can report non-zero progress while it is processing a compressed stream
> ----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-773
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-773
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.21.0
>
>         Attachments: 773.2.patch, 773.3.patch, 773.patch, 773.patch
>
>
> Currently, the LineRecordReader returns 0.0 from getProgress() for most inputs (since
the "end" of the filesplit is set to Long.MAX_VALUE for compressed inputs). This can be improved
to return a non-zero progress even for compressed streams (though it may not be very reflective
of the actual progress).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message