incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-91) Enable custom output file naming
Date Tue, 09 Oct 2012 03:02:03 GMT


Josh Wills commented on CRUNCH-91:

Patch looks good, I don't have any objections to the idea, although it's not a use case I've
hit yet. +1.
> Enable custom output file naming
> --------------------------------
>                 Key: CRUNCH-91
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Gabriel Reid
>         Attachments: CRUNCH-91.patch
> The current output file naming behavior in Crunch is to use the classic Hadoop-style
file naming (i.e. part-m-00001, part-r-00002), with the numerical part of the filename being
set based on the number of existing files in the output directory to avoid naming collisions.
> The intention of this issue is to allow developers to define their own output file names
for Crunch output files.
> The original underlying motivation for this issue is having a custom partitioner in a
job which routes records to a specific partition (and therefore reducer) based on content
of the record, and then needing to perform file renaming operations on the output files to
allow their names to include specific information about the partition they contain. The partition
number of files currently gets discarded by Crunch, making this renaming impossible. The approach
proposed here (custom file naming within Crunch) goes one step further, giving developers
a hook to actually define their own output file naming scheme.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message