hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuesheng Hu <yueshen...@gmail.com>
Subject Filesystem closed exception
Date Tue, 16 Oct 2012 06:59:20 GMT
Hi, Thomas

     When I test K-mean with cache enabled, the Filesystem closed exception
raised when the  input size became to  about 6GB, our cluster is:
     10 node (1 master, 9 slaves), 5 tasks/node, 1000MB RAM per task, I
think the cluster is power enough to handle this input size.
     but it failed, the log is :
12/10/11 10:05:17 INFO bsp.FileInputFormat: Total input paths to process :
45
12/10/11 10:05:18 INFO bsp.BSPJobClient: Running job: job_201210111001_0003
12/10/11 10:05:21 INFO bsp.BSPJobClient: Current supersteps number: 0
12/10/11 12:01:47 INFO bsp.BSPJobClient: Current supersteps number: 1
12/10/11 13:48:33 INFO bsp.BSPJobClient: Current supersteps number: 2
12/10/11 15:26:48 INFO bsp.BSPJobClient: Current supersteps number: 3
12/10/11 17:05:12 INFO bsp.BSPJobClient: Current supersteps number: 4
12/10/11 18:45:12 INFO bsp.BSPJobClient: Current supersteps number: 5
attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO bsp.BSPPeerImpl:
Moving to local cache files: INITIALLY IT WAS: null
attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO
sync.ZKSyncClient: Initializing ZK Sync Client
attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO
sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At datanode09/
192.168.1.219:61001
attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 ERROR
sync.ZooKeeperSyncClientImpl:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
NoNode for /bsp/job_201210111001_0003/peers
attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server:
Starting SocketReader
attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
Server Responder: starting
attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO
message.HadoopMessageManagerImpl: BSPPeer address:datanode09 port:61001
attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
Server listener on 61001: starting
attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
Server handler 0 on 61001: starting
attempt_201210111001_0003_000004_0: 12/10/11 18:45:47 INFO ml.KMeansBSP:
Finished! Writing the assignments...
attempt_201210111001_0003_000004_0: 12/10/11 18:46:29 ERROR bsp.BSPTask:
Error running bsp setup and bsp function.
attempt_201210111001_0003_000004_0: java.io.IOException: Filesystem closed
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2213)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2152)
attempt_201210111001_0003_000004_0: at
java.io.DataInputStream.readInt(DataInputStream.java:370)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1953)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1983)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2120)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.SequenceFileRecordReader.next(SequenceFileRecordReader.java:85)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.TrackedRecordReader.moveToNext(TrackedRecordReader.java:63)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.TrackedRecordReader.next(TrackedRecordReader.java:49)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.BSPPeerImpl.readNext(BSPPeerImpl.java:630)
attempt_201210111001_0003_000004_0: at
org.apache.hama.ml.KMeansBSP.recalculateAssignmentsAndWrite(KMeansBSP.java:269)
attempt_201210111001_0003_000004_0: at
org.apache.hama.ml.KMeansBSP.bsp(KMeansBSP.java:142)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
12/10/11 18:45:54 INFO bsp.BSPJobClient: Job failed.

What happened?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message