hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffry Roberts <geoffry.robe...@gmail.com>
Subject Mystery, A Tale of Two Reducers
Date Fri, 17 Jun 2011 20:30:53 GMT
All,

I have come across a situation that I don't understand.

*First Reducer:

*Behold the first of two reducers.  A fragment of it's output follows.
Simple no?  It doesn't do anything.  I've highlighted two records from the
output.  Keep them in mind.  Now lets look at the second reducer.
*
*protected void reduce(Text key, Iterable<Text> visitors, Context ctx)
 throws IOException, InterruptedException {
    for (Text visitor : visitors) {
       ctx.write(key, visitor);
    }
 }

2005-09-16=33614    42340108    *more==>*
2005-09-16=33614    42340106    *more==>*
*2005-09-16=33614    42340113    more==>*
2005-09-16=44135    42324490    *more==>*
2005-09-16=44135    42339700    *more==>*
...
*2005-09-16=44135    42324489    more==>*


*Second Reducer:*

This is a variation on the reducer from above.  A fragment of it's output
follows.  The difference is I add all visitors to a list then I iterate
through the list to produce my output.  Remember the two highlighted records
from above? They are now showing up in the output as duplicates and the
other records appear to be missing.  Why?  I have never seen an ArrayList
behave like this.  It must have something to do with hadoop.

I have a reasons for using the list.  One such reason is that I must have a
full count of all visitors before I can do my output, but I spare you.

To my mind, this second reducer should output the same as the first.

protected void reduce(Text key, Iterable<Text> visitors, Context ctx)
throws IOException, InterruptedException {
    List<Text> list = new ArrayList<Text>();
    for (Text visitor : visitors) {
        list.add(visitor);
    }
    for (Text visitor : list) {
        ctx.write(key, visitor);
    }
}

2005-09-16=33614    42340113    *more==>*
2005-09-16=33614    42340113    *more==>*
2005-09-16=33614    42340113    *more==>*
2005-09-16=44135    42324489    *more==>*
2005-09-16=44135    42324489    *more==>*

Thanks in advance

-- 
Geoffry Roberts

Mime
View raw message