avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (AVRO-513) java mapreduce api should pass iterator of matching objects to reduce
Date Mon, 14 Jun 2010 21:33:14 GMT

     [ https://issues.apache.org/jira/browse/AVRO-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Doug Cutting updated AVRO-513:

    Attachment: AVRO-513.patch

>  This could be improved to be a copy per reduce group, although it's more work.

I suppose once a value's been consumed from the queue it could be returned to a pool used
by the deserializer.  We could limit the size of the pool to be the same size as the queue.
 Is that what you had in mind? 

> The next() method should check to see if there is a next and throw NoSuchElementException
if not.


> Rather than polling the queue, you could use the blocking take() method and interrupt
the thread from close() to signal that there are no more values.

Here's a version that does this.  I worry a bit that something else could interrupt the thread
or intercept the InterruptedException, e.g., in the user's reducer.  Is that a well-founded
worry?  A better approach might be to put in a sentinel value.  Unfortunately this has to
be of type T, and we don't know how to construct a T.

> Starting a thread from within a subclass constructor is unsafe.


> java mapreduce api should pass iterator of matching objects to reduce
> ---------------------------------------------------------------------
>                 Key: AVRO-513
>                 URL: https://issues.apache.org/jira/browse/AVRO-513
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.0
>         Attachments: AVRO-513.patch, AVRO-513.patch
> The Java mapreduce API added in AVRO-493 requires reducers implementations to explicitly
detect sequences of matching data.
> Rather the reduce method might better look something like:
>    void reduce(Iterator<IN>, Collector<OUT>);
> Where all equal values are passed in a single call.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message