tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TAJO-257) Unit tests occassionally fail.
Date Wed, 16 Oct 2013 11:21:41 GMT

     [ https://issues.apache.org/jira/browse/TAJO-257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hyunsik Choi updated TAJO-257:
------------------------------

    Attachment: TAJO-257.patch

This issue is caused by several problems as follows:
 - Before workers are stopped completely, TajoTestingClient tried to shutdown HDFS.
 - Jetty server in each worker was hanging occasionally.
 - Some query execution query didn't close query results. So, it caused resource leak.

This patch solves the above problems. I've tested unit tests ten times repeatedly. All unit
tests are passed without any fail and hanging.

> Unit tests occassionally fail.
> ------------------------------
>
>                 Key: TAJO-257
>                 URL: https://issues.apache.org/jira/browse/TAJO-257
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Hyunsik Choi
>            Priority: Blocker
>             Fix For: 0.2-incubating
>
>         Attachments: TAJO-257.patch
>
>
> {noformat}
> 2013-10-16 14:55:43,110 INFO  tajo.TajoTestingCluster (TajoTestingCluster.java:startMiniTajoCluster(292))
- Mini Tajo cluster is up
> 2013-10-16 14:55:43,110 INFO  rm.TajoWorkerResourceManager (TajoWorkerResourceManager.java:workerHeartbeat(395))
- TajoWorker:host:local05.gruter.com, port=44238,44237,47304, slots=2048:4:8, used=0:0:0 added
in live TajoWorker list
> 2013-10-16 14:55:43,149 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2174))
- BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:60398 is added to blk_-2382303275850279660_1004{blockUCState=COMMITTED,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[127.0.0.1:60398|RBW]]} size 117894
> 2013-10-16 14:55:43,600 INFO  tajo.TajoTestingCluster (TajoTestingCluster.java:shutdownMiniCluster(447))
- ========================================
> 2013-10-16 14:55:43,600 INFO  tajo.TajoTestingCluster (TajoTestingCluster.java:shutdownMiniCluster(448))
- Shutdown minicluster
> 2013-10-16 14:55:43,600 INFO  tajo.TajoTestingCluster (TajoTestingCluster.java:shutdownMiniCluster(449))
- ========================================
> 2013-10-16 14:55:43,612 INFO  rpc.NettyServerBase (NettyServerBase.java:shutdown(120))
- Rpc (TajoMasterProtocol) listened on 127.0.0.1:44236) shutdown
> 2013-10-16 14:55:43,613 INFO  rpc.NettyServerBase (NettyServerBase.java:shutdown(120))
- Rpc (TajoMasterClientProtocol) listened on 127.0.0.1:44235) shutdown
> 2013-10-16 14:55:43,614 INFO  rpc.NettyServerBase (NettyServerBase.java:shutdown(120))
- Rpc (CatalogProtocol) listened on 127.0.0.1:44234) shutdown
> 2013-10-16 14:55:43,614 INFO  catalog.CatalogServer (CatalogServer.java:stop(161)) -
Catalog Server (127.0.0.1:44234) shutdown
> 2013-10-16 14:55:43,614 INFO  master.TajoMaster (TajoMaster.java:stop(436)) - Tajo Master
main thread exiting
> 2013-10-16 14:55:43,615 INFO  worker.TajoWorker (TajoWorker.java:run(451)) - Worker Resource
Heartbeat Thread stopped.
> 2013-10-16 14:55:43,616 INFO  rpc.NettyClientBase (NettyClientBase.java:close(87)) -
Proxy is disconnected from 127.0.0.1:44234
> 2013-10-16 14:55:43,618 INFO  rpc.NettyClientBase (NettyClientBase.java:close(87)) -
Proxy is disconnected from 127.0.0.1:44236
> 2013-10-16 14:55:43,619 INFO  rpc.NettyServerBase (NettyServerBase.java:shutdown(120))
- Rpc (TajoWorkerProtocol) listened on 0:0:0:0:0:0:0:0:44238) shutdown
> 2013-10-16 14:55:43,619 INFO  worker.TajoWorkerManagerService (TajoWorkerManagerService.java:stop(109))
- TajoWorkerManagerService stopped
> 2013-10-16 14:55:43,619 INFO  querymaster.QueryMaster (QueryMaster.java:run(346)) - QueryMaster
heartbeat thread stopped
> 2013-10-16 14:55:43,620 INFO  master.TajoAsyncDispatcher (TajoAsyncDispatcher.java:stop(122))
- AsyncDispatcher stopped:querymaster_1381902943098
> 2013-10-16 14:55:43,620 INFO  querymaster.QueryMaster (QueryMaster.java:stop(151)) -
QueryMaster stop
> 2013-10-16 14:55:43,620 INFO  worker.TajoWorkerClientService (TajoWorkerClientService.java:stop(103))
- TajoWorkerClientService stopping
> 2013-10-16 14:55:43,621 INFO  rpc.NettyServerBase (NettyServerBase.java:shutdown(120))
- Rpc (QueryMasterClientProtocol) listened on 0:0:0:0:0:0:0:0:44237) shutdown
> 2013-10-16 14:55:43,621 INFO  worker.TajoWorkerClientService (TajoWorkerClientService.java:stop(107))
- TajoWorkerClientService stopped
> 2013-10-16 14:55:43,628 INFO  worker.TajoWorker (TajoWorker.java:stop(211)) - TajoWorker
main thread exiting
> 2013-10-16 14:55:43,628 WARN  datanode.DirectoryScanner (DirectoryScanner.java:shutdown(289))
- DirectoryScanner: shutdown has been called
> 2013-10-16 14:55:43,745 WARN  datanode.DataNode (BPServiceActor.java:offerService(575))
- BPOfferService for Block pool BP-1571979412-192.168.0.205-1381902941942 (storage id DS-977793291-192.168.0.205-60398-1381902942381)
service to localhost/127.0.0.1:56994 interrupted
> 2013-10-16 14:55:43,745 WARN  datanode.DataNode (BPServiceActor.java:run(685)) - Ending
block pool service for: Block pool BP-1571979412-192.168.0.205-1381902941942 (storage id DS-977793291-192.168.0.205-60398-1381902942381)
service to localhost/127.0.0.1:56994
> 2013-10-16 14:55:43,747 WARN  datanode.DirectoryScanner (DirectoryScanner.java:shutdown(289))
- DirectoryScanner: shutdown has been called
> 2013-10-16 14:55:45,043 INFO  BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1308))
- BLOCK* ask 127.0.0.1:41909 to replicate blk_8623857380696071962_1002 to datanode(s) 127.0.0.1:60398
127.0.0.1:51428
> 2013-10-16 14:55:45,279 WARN  datanode.DataNode (DataNode.java:run(1485)) - DatanodeRegistration(127.0.0.1,
storageID=DS-1668021296-192.168.0.205-41909-1381902942201, infoPort=60807, ipcPort=52563,
storageInfo=lv=-40;cid=testClusterID;nsid=1161821716;c=0):Failed to transfer BP-1571979412-192.168.0.205-1381902941942:blk_8623857380696071962_1002
to 127.0.0.1:60398 got 
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:526)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:490)
> 	at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1423)
> 	at java.lang.Thread.run(Thread.java:724)
> 2013-10-16 14:55:45,361 WARN  datanode.DataNode (BPServiceActor.java:run(685)) - Ending
block pool service for: Block pool BP-1571979412-192.168.0.205-1381902941942 (storage id DS-617348020-192.168.0.205-51428-1381902942289)
service to localhost/127.0.0.1:56994
> 2013-10-16 14:55:51,045 INFO  BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1308))
- BLOCK* ask 127.0.0.1:41909 to replicate blk_8623857380696071962_1002 to datanode(s) 127.0.0.1:51428
127.0.0.1:60398
> 2013-10-16 14:55:51,282 WARN  datanode.DataNode (BlockReceiver.java:<init>(233))
- IOException in BlockReceiver constructor. Cause is 
> 2013-10-16 14:55:51,282 ERROR datanode.DataNode (DataXceiver.java:run(223)) - 127.0.0.1:51428:DataXceiver
error processing WRITE_BLOCK operation  src: /127.0.0.1:46346 dest: /127.0.0.1:51428
> java.io.IOException: block pool BP-1571979412-192.168.0.205-1381902941942 is not found
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getBlockPoolSlice(FsVolumeImpl.java:122)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTmpFile(FsVolumeImpl.java:150)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:818)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:91)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:155)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
> 	at java.lang.Thread.run(Thread.java:724)
> ^C2013-10-16 14:55:56,252 INFO  worker.TajoWorker (TajoWorker.java:run(459)) - ============================================
> 2013-10-16 14:55:56,252 INFO  worker.TajoWorker (TajoWorker.java:run(460)) - TajoWorker
received SIGINT Signal
> 2013-10-16 14:55:56,252 INFO  worker.TajoWorker (TajoWorker.java:run(461)) - ============================================
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message