hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "YangLai (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-1269) Failed on write sequence files in mapper.
Date Sat, 05 Dec 2009 13:07:20 GMT
Failed on write sequence files in mapper.
-----------------------------------------

                 Key: MAPREDUCE-1269
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1269
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 0.20.1
         Environment: Hadoop 0.20.1
Compiled by oom on Tue Sep  1 20:55:56 UTC 2009

Linux version 2.6.18-128.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704
(Red Hat 4.1.2-44)) #1 SMP Wed Jan 21 10:41:14 EST 2009

            Reporter: YangLai
            Priority: Critical


Because the sort phase is not necessary for my job, I want to write only values into sequence
files by keys. So I set a hashmap into mapper: 

	private HashMap<String, Writer> hm;

and I find a suitable org.apache.hadoop.io.SequenceFile.Writer by HashMap: 

		Writer seqWriter = hm.get(skey);
		if (seqWriter==null){
			try {
				seqWriter = new SequenceFile.Writer(new JobClient(job).getFs()
						, job, new Path(pPathOut, skey), VLongWritable.class, ByteWritable.class);
			} catch (IOException e) {
				e.printStackTrace();
			}
			if (seqWriter!=null){
				hm.put(skey, seqWriter);
			}else{
				return;
			}
		}

The file names are obtained from job.get("mapred.task.id"), that insure no replicas exist.
The system always outputs : 

java.io.IOException: Could not obtain block: blk_-5398274085876111743_1021 file=/YangLai/ranNum1GB/part-00015
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1787)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1615)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1742)
	at java.io.DataInputStream.readFully(Unknown Source)
	at java.io.DataInputStream.readFully(Unknown Source)
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428)
	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:43)
	at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

In fact, each mapper only write 16 sequence files, that will not be overloads to the hadoop
system. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message