hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Ding (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1110) Handle compressed file formats -- Gz, BZip with the new proposal
Date Mon, 14 Dec 2009 20:51:36 GMT

    [ https://issues.apache.org/jira/browse/PIG-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790362#action_12790362
] 

Richard Ding commented on PIG-1110:
-----------------------------------

Hi Jeff, I think it's a good idea to ask users to specify their intension in PigStorage constructor
(instead using file extensions). The issue with this approach, however,  is that the arguments
to PigStorage constructors can only be Strings so Pig determines the meanings of the arguments
by their positions. Therefore we want to consider carefully what other arguments needed to
add to the constructor in the future and what're their positions.

As for foring users to add .bz2 as the extension of the output files, this is actually necessary
since Hadoop LineRecordReader (used internally by PigStorage) finds the relevant compression
codec for the given file based on its filename suffix. So for now users must specify .bz2
as the extension of the output files if they want to store the files as BZip files.

> Handle compressed file formats -- Gz, BZip with the new proposal
> ----------------------------------------------------------------
>
>                 Key: PIG-1110
>                 URL: https://issues.apache.org/jira/browse/PIG-1110
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1110.patch, PIG_1110_Jeff.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message