incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-59) DoFn doesnt' need both configure and initialize methods
Date Sun, 09 Sep 2012 23:16:07 GMT


Josh Wills commented on CRUNCH-59:

The methods are designed to be called in different places during the flow.

configure() is called during the job construction process on the client side, and provides
a mechanism for a DoFn to alter the configuration of an MR job before it is submitted.

initialize() is called on the DoFn when it is executed on a Hadoop at the start of a map or
reduce task.

I can see the naming being confusing, since it sounds like the DoFn is the thing that is being
configured, when in actuality it is the Configuration object that is being modified by the
DoFn, and am certainly open to a clearer name.
> DoFn doesnt' need both configure and initialize methods
> -------------------------------------------------------
>                 Key: CRUNCH-59
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
> DoFN doesn't seem to need both {{public void configure(Configuration conf)}} and {{public
void initialize()}}. We can do with a single API like {{initialize(Configuration)}}.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message