pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nandor Kollar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories
Date Fri, 02 Dec 2016 13:50:58 GMT

     [ https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nandor Kollar updated PIG-3891:
-------------------------------
    Attachment: PIG-3891-5.patch

> FileBasedOutputSizeReader does not calculate size of files in sub-directories
> -----------------------------------------------------------------------------
>
>                 Key: PIG-3891
>                 URL: https://issues.apache.org/jira/browse/PIG-3891
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Nandor Kollar
>         Attachments: PIG-3891-1.patch, PIG-3891-2.patch, PIG-3891-3.patch, PIG-3891-4.patch,
PIG-3891-5.patch
>
>
> FileBasedOutputSizeReader only includes files in the top level output directory. So if
files are stored under subdirectories (For eg: MultiStorage), it does not have the bytes written
correctly. 
> 0.11 shows the correct number of total bytes written and this is a regression. A quick
look at the code shows that the JobStats.addOneOutputStats() in 0.11 also does not recursively
iterate and code is same as  FileBasedOutputSizeReader. Need to investigate where the correct
value comes from in 0.11 and fix it in 0.12.1/0.13.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message