hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas FOURNIER <thomasfournier...@gmail.com>
Subject Second Sort - Part of key changes when iterating through values when using composite key
Date Thu, 16 Mar 2017 11:07:02 GMT
Hello,

I've came across this post on stack overflow:

http://stackoverflow.com/questions/30079822/part-of-key-changes-when-iterating-through-values-when-using-composite-key-had

My question is on some Hadoop internals.

Basically, suppose that we have a list of (YEAR,TEMPERATURE) and we want to
do the following:
SELECT MAX(TEMPERATURE), MIN(TEMPERATURE) GROUP BY YEAR

Using a second sort with the composite key <Year,Temperature> will do the
trick.

The key point is that the "Temperature" part in the key will change while
iterating over values (that are NullWritable), because we group on the
"Year" part of the key.

protected void reduce(CompositeKey key, Iterable<NullWritable> values,
Context context) throws IOException, InterruptedException {
    for (NullWritable value : values) {

        !!!!   "Temprature" part of the key is changing here while
iterating over values   !!!

    }
}


I don't exactly understand the underlying mechanism. Do you have a
pointer to the code that explain why:

- the GroupingComparator class will execute a single "reduce" call by YEAR

- the value of the key (via the key reference) will change while
iterating over values

Thanks

Thomas

Mime
View raw message