hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From exception <except...@taomee.com>
Subject hadoop input sampler
Date Thu, 18 Nov 2010 12:08:20 GMT
Hi all,

I am trying to sample the key distribution before making a total sort. But the programs failed
and throw an exception.
This is the stack:

Exception in thread "main" java.lang.NullPointerException
        at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149)
        at org.apache.hadoop.mapreduce.lib.partition.InputSampler$RandomSampler.getSample(InputSampler.java:220)
        at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:315)
        at Sorter.run(Sorter.java:100)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
        at Sorter.main(Sorter.java:114)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

I check the code in LineRecordReader.java. And find that the exception is caused by this line:
newSize = in.readLine(value, maxLineLength,Math.max(maxBytesToConsume(pos), maxLineLength));

"in" is a null pointer. I specify the input format as "TextInputFormat". It looks like TextInputFormat
fails to read the data. Any ideas on how to fix this?  Thanks


I am under hadoop 0.21.0 and my job set up is:
......
job.setInputFormatClass(TextInputFormat.class);
job.setPartitionerClass(TotalOrderPartitioner.class);
InputSampler.Sampler<LongWritable, Text> sampler = new InputSampler.RandomSampler<LongWritable,
Text>(0.1, 10000, 10);

Path input = FileInputFormat.getInputPaths(job)[0];
input = input.makeQualified(input.getFileSystem(conf));
Path partitionFile = new Path(input, "_partitions");
TotalOrderPartitioner.setPartitionFile(conf, partitionFile);

InputSampler.writePartitionFile(job, sampler);

URI partitionUri = new URI(partitionFile.toString() + "#_partitions");
DistributedCache.addCacheFile(partitionUri, conf);
DistributedCache.createSymlink(conf);
......



Mime
View raw message