incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Created] (CRUNCH-91) Enable custom output file naming
Date Sun, 07 Oct 2012 21:56:03 GMT
Gabriel Reid created CRUNCH-91:

             Summary: Enable custom output file naming
                 Key: CRUNCH-91
             Project: Crunch
          Issue Type: Improvement
            Reporter: Gabriel Reid

The current output file naming behavior in Crunch is to use the classic Hadoop-style file
naming (i.e. part-m-00001, part-r-00002), with the numerical part of the filename being set
based on the number of existing files in the output directory to avoid naming collisions.

The intention of this issue is to allow developers to define their own output file names for
Crunch output files.

The original underlying motivation for this issue is having a custom partitioner in a job
which routes records to a specific partition (and therefore reducer) based on content of the
record, and then needing to perform file renaming operations on the output files to allow
their names to include specific information about the partition they contain. The partition
number of files currently gets discarded by Crunch, making this renaming impossible. The approach
proposed here (custom file naming within Crunch) goes one step further, giving developers
a hook to actually define their own output file naming scheme.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message