avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-513) java mapreduce api should pass iterator of matching objects to reduce
Date Mon, 14 Jun 2010 18:42:15 GMT

    [ https://issues.apache.org/jira/browse/AVRO-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878688#action_12878688

Tom White commented on AVRO-513:

Generally looks good.

* Does the change to AvroKeySerialization mean that a copy is made per reduce value? This
could be improved to be a copy per reduce group, although it's more work. 
* If a subclass wants to override close() then they have to call super.close() or the reducer
may not complete. This could be the source of very subtle bugs, so it might be better to mark
close() as final, and have another close method for subclasses, closeReducer() perhaps? However,
this gets in the way of users who may want to use the non-grouping reduce() method. I don't
see a good answer here.
* If I call next() without calling hasNext(), I don't get a NoSuchElementException, which
doesn't conform to the Iterator contract. The next() method should check to see if there is
a next and throw NoSuchElementException if not.
* Rather than polling the queue, you could use the blocking take() method and interrupt the
thread from close() to signal that there are no more values.
* Starting a thread from within a subclass constructor is unsafe (because the object may not
be fully constructed, see http://www.ibm.com/developerworks/java/library/j-jtp0618.html#4).
Better to call runner.start() just after constructing the runner.

> java mapreduce api should pass iterator of matching objects to reduce
> ---------------------------------------------------------------------
>                 Key: AVRO-513
>                 URL: https://issues.apache.org/jira/browse/AVRO-513
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.0
>         Attachments: AVRO-513.patch
> The Java mapreduce API added in AVRO-493 requires reducers implementations to explicitly
detect sequences of matching data.
> Rather the reduce method might better look something like:
>    void reduce(Iterator<IN>, Collector<OUT>);
> Where all equal values are passed in a single call.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message