hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johannes.Lichtenberger" <Johannes.Lichtenber...@uni-konstanz.de>
Subject Sorting/Grouping
Date Wed, 20 Oct 2010 22:11:26 GMT
Hi,

is it possible to reverse the sort-order? I mean, it seems the reducers
are getting the output in descending order. As I want to sort wikipedia
revisions of articles it would be great in ascending order. Now the
output on a simple little test file is (sorted/grouped automatically by
Map/Reduce):

<page><id>774932</id><title>foo</title><revision><id>89865</id><timestamp>2003-11-21T02:12:21Z</timestamp><text>dshajkl</text></revision></page>
<page><id>774932</id><title>foo</title><revision><id>233192</id><timestamp>2002-12-20T02:12:21Z</timestamp><text>blaaaaa</text></revision></page>
<page><id>732819</id><title>blubb</title><revision><id>233192</id><timestamp>2001-02-21T02:12:21Z</timestamp><text>tztztz</text></revision></page>
<page><id>372819</id><title>bla</title><revision><id>233192</id><timestamp>2001-01-21T02:12:21Z</timestamp><text>blaaaaa</text></revision><revision><id>e7777</id><timestamp>2001-01-21T02:12:21Z</timestamp><text>blubb</text></revision></page>
<page><id>732819</id><title>blubb</title><revision><id>233192</id><timestamp>2000-01-20T02:12:21Z</timestamp><text>blaaaaa</text></revision></page>

The reverse order would be great.

BTW: I would like to add a root-Element, since XML requires a root node.
I assume it's not possible to determine if the first and last reduce-Job
is running!?

regards,
Johannes

Mime
View raw message