hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-870) Pig is broken loading .gz files
Date Tue, 15 Sep 2009 05:35:57 GMT

    [ https://issues.apache.org/jira/browse/PIG-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755356#action_12755356
] 

Jeff Zhang commented on PIG-870:
--------------------------------

I load gz files, and it works fine.

I also looked into the code, (PigSlicer, line 99), it seems pig won't split a gz file

{code}
 if (name.endsWith(".gz") || !splittable) {
                // Anything that ends with a ".gz" we must process as a complete
                // file
                slices.add(new PigSlice(name, funcSpec, 0, size));
 }
{code}

> Pig is broken loading .gz files
> -------------------------------
>
>                 Key: PIG-870
>                 URL: https://issues.apache.org/jira/browse/PIG-870
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Priority: Minor
>
> Looks like the code is trying to split a gz file which is not supported. In general,
gz is a poor choice for compression with Pig since the parallelization is limitted to the
number of files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message