pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tomas Hudik (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4533) support of concatenated bz2/gz files
Date Tue, 19 May 2015 16:28:00 GMT

     [ https://issues.apache.org/jira/browse/PIG-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tomas Hudik updated PIG-4533:
-----------------------------
    Component/s: parser

> support of concatenated bz2/gz files
> ------------------------------------
>
>                 Key: PIG-4533
>                 URL: https://issues.apache.org/jira/browse/PIG-4533
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation, parser
>            Reporter: Tomas Hudik
>
> Documentation (since 0.11.1 at least) says :
> http://pig.apache.org/docs/r0.11.1/func.html#handling-compression
> _"Note: PigStorage and TextLoader correctly read compressed files as long as they are
NOT CONCATENATED FILES generated in this manner: ..."_
> I doubt this is still true, since
> 1. I did a test - concatenated some files and processed them. However, all the
> results were identical to ones that were produces on non-concatenated
> files. Why? They should be different...
> 2. Jira's https://issues.apache.org/jira/i#browse/HADOOP-4012 and 
> https://issues.apache.org/jira/i#browse/HADOOP-6835 says this was fixed in Hadoop 0.22,
Hadoop 0.20 respectively. That said Hadoop (1 and 2) are supporting this. I suppose Pig do
not make compression on its own but rather depends on hadoop-core (hadoo-common respectively)
libraries.
> If I'm right, the documentation should be fixed (delete the part about concatinated compression
files problems)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message