hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1110) Handle compressed file formats -- Gz, BZip with the new proposal
Date Wed, 16 Dec 2009 06:08:18 GMT

    [ https://issues.apache.org/jira/browse/PIG-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791223#action_12791223

Jeff Zhang commented on PIG-1110:


Regarding the second point, I know that TextInputFormat and TextOutputFormat do not support
.bz file extension internally . But it's not the responsibility of PigStorage() to control
the compression, it is still the responsibility of OutputFormat.  Because if you want to support
.bz output, you have to add the following code in PigStorage()

    if (location.endsWith(".bz")) {
       FileOutputFormat.setCompressOutput(job, true);
      FileOutputFormat.setOutputCompressorClass(job,  BZipCodec.class);

So eventually it 'sstill hadoop's OutputFormat that control the compression not PigStoroage().
 And even you add the above code in PigStorage, it still won't work. You have to add the BzipCodec.class
in hadoop's classpath. and setting the CompressionCodec in configuration.

In a word, I do not think it make sense to use the output folder name to determine the CompressionCodec.

> Handle compressed file formats -- Gz, BZip with the new proposal
> ----------------------------------------------------------------
>                 Key: PIG-1110
>                 URL: https://issues.apache.org/jira/browse/PIG-1110
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1110.patch, PIG_1110_Jeff.patch

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message