hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Filesystem closed exception
Date Tue, 16 Oct 2012 07:07:59 GMT
Your Datanode is overloaded, try to profile it and check the heapsize of
your namenode and your datanodes.

2012/10/16 Yuesheng Hu <yueshenghu@gmail.com>

> Hi, Thomas
>
>      When I test K-mean with cache enabled, the Filesystem closed exception
> raised when the  input size became to  about 6GB, our cluster is:
>      10 node (1 master, 9 slaves), 5 tasks/node, 1000MB RAM per task, I
> think the cluster is power enough to handle this input size.
>      but it failed, the log is :
> 12/10/11 10:05:17 INFO bsp.FileInputFormat: Total input paths to process :
> 45
> 12/10/11 10:05:18 INFO bsp.BSPJobClient: Running job: job_201210111001_0003
> 12/10/11 10:05:21 INFO bsp.BSPJobClient: Current supersteps number: 0
> 12/10/11 12:01:47 INFO bsp.BSPJobClient: Current supersteps number: 1
> 12/10/11 13:48:33 INFO bsp.BSPJobClient: Current supersteps number: 2
> 12/10/11 15:26:48 INFO bsp.BSPJobClient: Current supersteps number: 3
> 12/10/11 17:05:12 INFO bsp.BSPJobClient: Current supersteps number: 4
> 12/10/11 18:45:12 INFO bsp.BSPJobClient: Current supersteps number: 5
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO bsp.BSPPeerImpl:
> Moving to local cache files: INITIALLY IT WAS: null
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO
> sync.ZKSyncClient: Initializing ZK Sync Client
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO
> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At datanode09/
> 192.168.1.219:61001
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 ERROR
> sync.ZooKeeperSyncClientImpl:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for /bsp/job_201210111001_0003/peers
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server:
> Starting SocketReader
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
> Server Responder: starting
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO
> message.HadoopMessageManagerImpl: BSPPeer address:datanode09 port:61001
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
> Server listener on 61001: starting
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
> Server handler 0 on 61001: starting
> attempt_201210111001_0003_000004_0: 12/10/11 18:45:47 INFO ml.KMeansBSP:
> Finished! Writing the assignments...
> attempt_201210111001_0003_000004_0: 12/10/11 18:46:29 ERROR bsp.BSPTask:
> Error running bsp setup and bsp function.
> attempt_201210111001_0003_000004_0: java.io.IOException: Filesystem closed
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2213)
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2152)
> attempt_201210111001_0003_000004_0: at
> java.io.DataInputStream.readInt(DataInputStream.java:370)
> attempt_201210111001_0003_000004_0: at
>
> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1953)
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1983)
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2120)
> attempt_201210111001_0003_000004_0: at
>
> org.apache.hama.bsp.SequenceFileRecordReader.next(SequenceFileRecordReader.java:85)
> attempt_201210111001_0003_000004_0: at
>
> org.apache.hama.bsp.TrackedRecordReader.moveToNext(TrackedRecordReader.java:63)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.bsp.TrackedRecordReader.next(TrackedRecordReader.java:49)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.bsp.BSPPeerImpl.readNext(BSPPeerImpl.java:630)
> attempt_201210111001_0003_000004_0: at
>
> org.apache.hama.ml.KMeansBSP.recalculateAssignmentsAndWrite(KMeansBSP.java:269)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.ml.KMeansBSP.bsp(KMeansBSP.java:142)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
> 12/10/11 18:45:54 INFO bsp.BSPJobClient: Job failed.
>
> What happened?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message