Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-dev@hadoop.apache.org
Message-ID: <1638147322.1260018440697.JavaMail.jira@brutus>
Date: Sat, 5 Dec 2009 13:07:20 +0000 (UTC)
From: "YangLai (JIRA)" <jira@apache.org>
To: mapreduce-dev@hadoop.apache.org
Subject: [jira] Created: (MAPREDUCE-1269) Failed on write sequence files in
 mapper.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Failed on write sequence files in mapper.
-----------------------------------------

                 Key: MAPREDUCE-1269
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1269
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 0.20.1
         Environment: Hadoop 0.20.1
Compiled by oom on Tue Sep  1 20:55:56 UTC 2009

Linux version 2.6.18-128.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Jan 21 10:41:14 EST 2009

            Reporter: YangLai
            Priority: Critical


Because the sort phase is not necessary for my job, I want to write only values into sequence files by keys. So I set a hashmap into mapper: 

	private HashMap<String, Writer> hm;

and I find a suitable org.apache.hadoop.io.SequenceFile.Writer by HashMap: 

		Writer seqWriter = hm.get(skey);
		if (seqWriter==null){
			try {
				seqWriter = new SequenceFile.Writer(new JobClient(job).getFs()
						, job, new Path(pPathOut, skey), VLongWritable.class, ByteWritable.class);
			} catch (IOException e) {
				e.printStackTrace();
			}
			if (seqWriter!=null){
				hm.put(skey, seqWriter);
			}else{
				return;
			}
		}

The file names are obtained from job.get("mapred.task.id"), that insure no replicas exist.
The system always outputs : 

java.io.IOException: Could not obtain block: blk_-5398274085876111743_1021 file=/YangLai/ranNum1GB/part-00015
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1787)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1615)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1742)
	at java.io.DataInputStream.readFully(Unknown Source)
	at java.io.DataInputStream.readFully(Unknown Source)
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428)
	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:43)
	at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

In fact, each mapper only write 16 sequence files, that will not be overloads to the hadoop system. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.