crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-194) Utilities and Documentation for handling values in reduce-Iterable
Date Mon, 08 Apr 2013 22:32:17 GMT
Micah Whitacre created CRUNCH-194:
-------------------------------------

             Summary: Utilities and Documentation for handling values in reduce-Iterable
                 Key: CRUNCH-194
                 URL: https://issues.apache.org/jira/browse/CRUNCH-194
             Project: Crunch
          Issue Type: Bug
          Components: Core
            Reporter: Micah Whitacre
            Assignee: Josh Wills


Clarify documentation and provide utilities for the appropriate use of values from inside
the Iterable inside of DoFn and MapFn?

As an example we've gotten bitten by the case where we were storing off the individual items
inside the Iterable to then do processing once we've read all the values in. 

{code}
@Override
    public Foo map(final Pair<Bar, Iterable<Bat>> input) {
        List<Bat> bats = ...;
        for(Bat b: input){
            bats.add(b);
        }
        return new Foo(bats);
    }
{code}

When this gets ran during a reduce, the list bats will end up with a single item instead of
multiple items. For this to work properly we actually have to make a copy of each item in
the iterable. Making the javadoc more clearly state this behavior would help consumers to
write the MapFn/DoFn correctly the first time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message