hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saptarshi Guha <saptarshi.g...@gmail.com>
Subject Sorting on several columns using KeyFieldSeparator and Paritioner
Date Sun, 18 Jan 2009 01:06:35 GMT
Hello,
I have  a file with n columns, some which are text and some numeric.
Given a sequence of indices, i would like to sort on those indices i.e
first on Index1, then within Index2 and so on.
In the example code below, i have 3 columns, numeric, text, numeric,
space separated.
Sort on 2(reverse), then 1(reverse,numeric) and lastly 3

Though my code runs (and gives wrong results,col 2 is sorted in
reverse, and within that col3 which is treated as tex and then col1 )
on the local, when distributed I get a merge error - my guess is
fixing the latter fixes the former.

This is the error:
java.io.IOException: Final merge failed
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2093)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:457)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
        at org.apache.hadoop.mapred.Child.main(Child.java:155)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 562
        at org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:128)
        at org.apache.hadoop.mapred.lib.KeyFieldBasedComparator.compareByteSequence(KeyFieldBasedComparator.java:109)
        at org.apache.hadoop.mapred.lib.KeyFieldBasedComparator.compare(KeyFieldBasedComparator.java:85)
        at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:308)
        at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144)
        at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
        at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:270)
        at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:285)
        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:108)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2087)
        ... 3 more


Thanks for your time

And the code (not too big) is
==CODE==

public class RMRSort extends Configured implements Tool {

  static class RMRSortMap extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text> {

    public void map(LongWritable key, Text value,OutputCollector<Text,
Text> output, Reporter reporter)  throws IOException {
	output.collect(value,value);
    }
  }

    static class RMRSortReduce extends MapReduceBase implements
Reducer<Text, Text, NullWritable, Text> {

	public void reduce(Text key, Iterator<Text>
values,OutputCollector<NullWritable, Text> output, Reporter reporter)
throws IOException {
	    NullWritable n = NullWritable.get();
	    while(values.hasNext())
		    output.collect(n,values.next() );
	}
    }


    static JobConf createConf(String rserveport,String uid,String
infolder, String outfolder)
	Configuration defaults = new Configuration();
	JobConf jobConf = new JobConf(defaults, RMRSort.class);
	jobConf.setJobName("Sorter: "+uid);
	jobConf.addResource(new
Path(System.getenv("HADOOP_CONF_DIR")+"/hadoop-site.xml"));
// 	jobConf.set("mapred.job.tracker", "local");
	jobConf.setMapperClass(RMRSortMap.class);
	jobConf.setReducerClass(RMRSortReduce.class);
	jobConf.set("map.output.key.field.separator",fsep);
	jobConf.setPartitionerClass(KeyFieldBasedPartitioner.class);
	jobConf.set("mapred.text.key.partitioner.options","-k2,2 -k1,1 -k3,3");
	jobConf.setOutputKeyComparatorClass(KeyFieldBasedComparator.class);
	jobConf.set("mapred.text.key.comparator.options","-k2r,2r -k1rn,1rn -k3n,3n");
//infolder, outfolder information removed
	jobConf.setMapOutputKeyClass(Text.class);
	jobConf.setMapOutputValueClass(Text.class);
	jobConf.setOutputKeyClass(NullWritable.class);
	return(jobConf);
    }
    public int run(String[] args) throws Exception {
	return(1);
    }

}




-- 
Saptarshi Guha - saptarshi.guha@gmail.com

Mime
View raw message