tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-1993) Implement a pluggable InputSizeEstimator for grouping fairly
Date Mon, 09 Feb 2015 18:11:34 GMT

    [ https://issues.apache.org/jira/browse/TEZ-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312552#comment-14312552
] 

Bikas Saha commented on TEZ-1993:
---------------------------------

Right. I now recall observing similar issues when I was trying to add a wait for locality
attribute to the split.
The patch looks good.
Question: What would be the issue if the FileSplit (lets say ORCSplit) actually provided logical
length of the split instead of the file length in the getLength() method. Theoretically that
would not be wrong.

> Implement a pluggable InputSizeEstimator for grouping fairly
> ------------------------------------------------------------
>
>                 Key: TEZ-1993
>                 URL: https://issues.apache.org/jira/browse/TEZ-1993
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>         Attachments: TEZ-1993.1.patch, TEZ-1993.2.patch
>
>
> Split grouping is currently done using a file size measurement which is the exact size
of the split as it stays at rest on HDFS.
> This is not valid for columnar formats and especially suffers from highly compressible
data skews.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message