crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremy Beard (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-91) Enable custom output file naming
Date Thu, 06 Jun 2013 23:26:20 GMT


Jeremy Beard commented on CRUNCH-91:

Gabriel, have you had a chance to work on the fanOut() method? I think this would be a very
useful addition to Crunch.
> Enable custom output file naming
> --------------------------------
>                 Key: CRUNCH-91
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>             Fix For: 0.4.0
>         Attachments: CRUNCH-91.patch
> The current output file naming behavior in Crunch is to use the classic Hadoop-style
file naming (i.e. part-m-00001, part-r-00002), with the numerical part of the filename being
set based on the number of existing files in the output directory to avoid naming collisions.
> The intention of this issue is to allow developers to define their own output file names
for Crunch output files.
> The original underlying motivation for this issue is having a custom partitioner in a
job which routes records to a specific partition (and therefore reducer) based on content
of the record, and then needing to perform file renaming operations on the output files to
allow their names to include specific information about the partition they contain. The partition
number of files currently gets discarded by Crunch, making this renaming impossible. The approach
proposed here (custom file naming within Crunch) goes one step further, giving developers
a hook to actually define their own output file naming scheme.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message