hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-870) Pig is broken loading .gz files
Date Tue, 15 Sep 2009 05:35:57 GMT

    [ https://issues.apache.org/jira/browse/PIG-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755356#action_12755356

Jeff Zhang commented on PIG-870:

I load gz files, and it works fine.

I also looked into the code, (PigSlicer, line 99), it seems pig won't split a gz file

 if (name.endsWith(".gz") || !splittable) {
                // Anything that ends with a ".gz" we must process as a complete
                // file
                slices.add(new PigSlice(name, funcSpec, 0, size));

> Pig is broken loading .gz files
> -------------------------------
>                 Key: PIG-870
>                 URL: https://issues.apache.org/jira/browse/PIG-870
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Priority: Minor
> Looks like the code is trying to split a gz file which is not supported. In general,
gz is a poor choice for compression with Pig since the parallelization is limitted to the
number of files.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message