incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-91) Enable custom output file naming
Date Sun, 07 Oct 2012 22:00:05 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gabriel Reid updated CRUNCH-91:
-------------------------------

    Attachment: CRUNCH-91.patch

Patch introduces a FileNamingScheme interface which can be provided to Path-based Targets
to create custom file names. The current behavior is maintained by default. Any thoughts on
this anyone?
                
> Enable custom output file naming
> --------------------------------
>
>                 Key: CRUNCH-91
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-91
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Gabriel Reid
>         Attachments: CRUNCH-91.patch
>
>
> The current output file naming behavior in Crunch is to use the classic Hadoop-style
file naming (i.e. part-m-00001, part-r-00002), with the numerical part of the filename being
set based on the number of existing files in the output directory to avoid naming collisions.
> The intention of this issue is to allow developers to define their own output file names
for Crunch output files.
> The original underlying motivation for this issue is having a custom partitioner in a
job which routes records to a specific partition (and therefore reducer) based on content
of the record, and then needing to perform file renaming operations on the output files to
allow their names to include specific information about the partition they contain. The partition
number of files currently gets discarded by Crunch, making this renaming impossible. The approach
proposed here (custom file naming within Crunch) goes one step further, giving developers
a hook to actually define their own output file naming scheme.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message