hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Read MapFileOutputFormat output in ascending key order
Date Wed, 13 Feb 2008 18:05:28 GMT

Any suggestions how to do that? Let's say I have several part-NNNN 
MapFile-s created by MapFileOutputFormat using a specified Comparator 
and Partitioner. How can I traverse the data in strictly ascending 
global key order (i.e. across all parts)?

The best that comes to my mind is the following pseudo-code:

get the readers;
get the first keys from all readers, and put them on a sorted list;
do {
	remove the smallest key, and retrieve value from its reader;
	add next key from the same reader:
		if it's smaller than other keys, continue;
	if the list is empty, read next values from all readers;
} while (more keys from any reader);

Any other suggestions?

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

View raw message