crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-91) Enable custom output file naming
Date Fri, 07 Jun 2013 08:41:20 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677892#comment-13677892
] 

Gabriel Reid commented on CRUNCH-91:
------------------------------------

[~rem120] No, I haven't looked at that in quite a while. It's not something I need right at
the moment, so it kind of fell off my radar. I agree that it would be a really useful feature,
so I'd love to see it added in to Crunch -- if I get a chance in the next while, I'll try
to take a crack at it (unless someone else wants to).
                
> Enable custom output file naming
> --------------------------------
>
>                 Key: CRUNCH-91
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-91
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-91.patch
>
>
> The current output file naming behavior in Crunch is to use the classic Hadoop-style
file naming (i.e. part-m-00001, part-r-00002), with the numerical part of the filename being
set based on the number of existing files in the output directory to avoid naming collisions.
> The intention of this issue is to allow developers to define their own output file names
for Crunch output files.
> The original underlying motivation for this issue is having a custom partitioner in a
job which routes records to a specific partition (and therefore reducer) based on content
of the record, and then needing to perform file renaming operations on the output files to
allow their names to include specific information about the partition they contain. The partition
number of files currently gets discarded by Crunch, making this renaming impossible. The approach
proposed here (custom file naming within Crunch) goes one step further, giving developers
a hook to actually define their own output file naming scheme.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message