hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Read MapFileOutputFormat output in ascending key order
Date Wed, 13 Feb 2008 18:42:39 GMT
Would one of the SequenceFile#merge() methods suffice?

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.Sorter.html#merge(java.util.List,%20org.apache.hadoop.fs.Path)

Doug

Andrzej Bialecki wrote:
> Hi,
> 
> Any suggestions how to do that? Let's say I have several part-NNNN 
> MapFile-s created by MapFileOutputFormat using a specified Comparator 
> and Partitioner. How can I traverse the data in strictly ascending 
> global key order (i.e. across all parts)?
> 
> The best that comes to my mind is the following pseudo-code:
> 
> get the readers;
> get the first keys from all readers, and put them on a sorted list;
> do {
>     remove the smallest key, and retrieve value from its reader;
>     add next key from the same reader:
>         if it's smaller than other keys, continue;
>     if the list is empty, read next values from all readers;
> } while (more keys from any reader);
> 
> Any other suggestions?
> 


Mime
View raw message