asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: Jenkins integration tests
Date Mon, 15 Aug 2016 01:02:09 GMT
I did some more investigation and the last few commits (especially those 
on
Friday) [1] look pretty harmless. Also the time of the hangs seems quite
diverse. I’ve looked at the console output of the last 12 hangs and 
copied a
few lines out that preceded the capturing of the thread dumps. I don’t
really see any patterns. In one case there seems to have a lack of space
reported by HDFS, but I’m not sure if that’s a temporary problem or 
if
that’s part of the bigger problem.

I’ve attached the links to the console output and the lines I copied 
and I’m
happy to take suggestions what to do next …

Thanks,
Till

[1] https://git-wip-us.apache.org/repos/asf?p=asterixdb.git

--

https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/295/console

00:56:41.801 INFO: Starting [TEST]: recovery_ddl/temp_dataset_recovery 
...
02:03:50.080 executing script 'Capture Thread Dumps'


https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/296/console

00:38:29.735 INFO: Starting [TEST]: 
recover_after_abort/primary_plus_rtree_index_insert_and_delete ...
02:03:44.009 executing script 'Capture Thread Dumps'


https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/298/console

00:24:40.118 	at java.lang.Thread.run(Thread.java:745)
02:03:55.458 executing script 'Capture Thread Dumps'


https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/300/console

02:03:14.569 2016-08-12 22:08:35,303 WARN  
[org.apache.hadoop.hdfs.server.namenode.FSNamesystem$NameNodeResourceMonitor@3be8821f] 
namenode.NameNodeResourceChecker 
(NameNodeResourceChecker.java:isResourceAvailable(89)) - Space available 
on volume 'null' is 17981440, which is below the configured reserved 
amount 104857600
02:03:14.569 2016-08-12 22:08:35,303 WARN  
[org.apache.hadoop.hdfs.server.namenode.FSNamesystem$NameNodeResourceMonitor@3be8821f] 
namenode.FSNamesystem (FSNamesystem.java:run(4018)) - NameNode low on 
available disk space. Already in safe mode.
02:03:15.833 executing script 'Capture Thread Dumps'


https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/302/console

00:40:08.703 SLF4J: Class path contains multiple SLF4J bindings.
00:40:08.704 SLF4J: Found binding in 
[jar:file:/home/jenkins/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
00:40:08.704 SLF4J: Found binding in 
[jar:file:/home/jenkins/.m2/repository/org/slf4j/slf4j-jcl/1.6.3/slf4j-jcl-1.6.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
00:40:08.704 SLF4J: See 
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
02:03:00.955 executing script 'Capture Thread Dumps'


https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/305/console

00:26:08.302 Aug 13, 2016 3:41:28 AM 
org.apache.asterix.test.aql.TestExecutor executeTest
00:26:08.302 INFO: Starting [TEST]: 
query_after_restart/dataset-with-meta-record ...
02:02:34.039 executing script 'Capture Thread Dumps'


https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/306/console

02:02:51.189 Waiting for new AsterixDB Instance to start   .
02:02:52.029 2016-08-13 05:42:00,461 INFO  [Node Status Updater] 
nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:getNodeStatusAndUpdateContainersInContext(314)) 
- Sending out status for container: container_id { app_attempt_id { 
application_id { id: 1 cluster_timestamp: 1471063321301 } attemptId: 1 } 
id: 1 } state: C_RUNNING diagnostics: "" exit_status: -1000
02:02:52.030 2016-08-13 05:42:00,461 INFO  [Node Status Updater] 
nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:getNodeStatusAndUpdateContainersInContext(314)) 
- Sending out status for container: container_id { app_attempt_id { 
application_id { id: 1 cluster_timestamp: 1471063321301 } attemptId: 1 } 
id: 3 } state: C_RUNNING diagnostics: "" exit_status: -1000
02:02:52.179 executing script 'Capture Thread Dumps'


https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/310/console

00:54:47.092 SLF4J: Class path contains multiple SLF4J bindings.
00:54:47.093 SLF4J: Found binding in 
[jar:file:/home/jenkins/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
00:54:47.093 SLF4J: Found binding in 
[jar:file:/home/jenkins/.m2/repository/org/slf4j/slf4j-jcl/1.6.3/slf4j-jcl-1.6.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
00:54:47.093 SLF4J: See 
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
02:02:28.914 executing script 'Capture Thread Dumps'


https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/312/console

00:20:29.319 Caused by: 
org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /Asterix/asterix/state
00:20:29.319 	at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
00:20:29.319 	at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
00:20:29.320 	at 
org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
00:20:29.320 	at 
org.apache.asterix.event.service.ZooKeeperService.removeAsterixInstance(ZooKeeperService.java:205)
00:20:29.320 	at 
org.apache.asterix.event.service.ZooKeeperService.updateAsterixInstance(ZooKeeperService.java:233)
00:20:29.320 	at 
org.apache.asterix.installer.command.StartCommand.execCommand(StartCommand.java:68)
00:20:29.320 	at 
org.apache.asterix.installer.command.AbstractCommand.execute(AbstractCommand.java:38)
00:20:29.320 	at 
org.apache.asterix.installer.command.CommandHandler.processCommand(CommandHandler.java:86)
00:20:29.320 	at 
org.apache.asterix.installer.test.AsterixInstallerIntegrationUtil.executeCommand(AsterixInstallerIntegrationUtil.java:220)
00:20:29.320 	... 38 more
00:20:29.320 ...Unexpected!
00:20:29.338 Aug 13, 2016 6:53:29 PM 
org.apache.asterix.installer.test.AsterixLifecycleIT 
test_1_StopActiveInstance
00:20:29.338 INFO: Starting test: test_1_StopActiveInstance
00:20:43.996 Aug 13, 2016 6:53:44 PM 
org.apache.asterix.installer.test.AsterixLifecycleIT 
test_2_StartActiveInstance
00:20:43.996 INFO: Starting test: test_2_StartActiveInstance
00:20:57.766 Aug 13, 2016 6:53:57 PM 
org.apache.asterix.installer.test.AsterixLifecycleIT 
test_3_DeleteActiveInstance
00:20:57.767 INFO: Starting test: test_3_DeleteActiveInstance
00:20:58.093 Aug 13, 2016 6:53:58 PM 
org.apache.asterix.installer.test.AsterixLifecycleIT 
test_3_DeleteActiveInstance
00:20:58.093 INFO: PASSED: test_3_DeleteActiveInstance
02:02:29.101 executing script 'Capture Thread Dumps'


https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/314/console

01:00:35.717 Running 
org.apache.asterix.installer.test.AsterixExternalLibraryIT
01:00:36.335 log4j:WARN No appenders could be found for logger 
(org.apache.asterix.installer.command.AbstractCommand).
01:00:36.335 log4j:WARN Please initialize the log4j system properly.
01:00:40.562 SLF4J: Class path contains multiple SLF4J bindings.
01:00:40.562 SLF4J: Found binding in 
[jar:file:/home/jenkins/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
01:00:40.562 SLF4J: Found binding in 
[jar:file:/home/jenkins/.m2/repository/org/slf4j/slf4j-jcl/1.6.3/slf4j-jcl-1.6.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
01:00:40.563 SLF4J: See 
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
02:02:46.954 executing script 'Capture Thread Dumps'


https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/315/console

00:30:25.774 Caused by: java.lang.Exception: INFO: Stopped Asterix 
instance: nc1
00:30:25.775 INFO: Name:nc1
00:30:25.775 Created:Sun Aug 14 15:07:33 UTC 2016
00:30:25.775 Web-Url:http://127.0.0.1:19001
00:30:25.775 State:ACTIVE (Sun Aug 14 15:07:57 UTC 2016)
00:30:25.775
00:30:25.775 ERROR: KeeperErrorCode = ConnectionLoss for /Asterix/nc1
00:30:25.775
00:30:25.775 	at 
org.apache.asterix.test.aql.TestExecutor.executeTest(TestExecutor.java:617)
00:30:25.775 	... 42 more
00:30:25.775 ...Unexpected!
00:30:25.796 Aug 14, 2016 3:08:25 PM 
org.apache.asterix.test.aql.TestExecutor executeTest
00:30:25.796 INFO: Starting [TEST]: query_after_restart/big_object_20M 
...
02:02:39.073 executing script 'Capture Thread Dumps'


https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/316/console

00:42:58.474 Running 
org.apache.asterix.installer.test.AsterixExternalLibraryIT
00:42:59.099 log4j:WARN No appenders could be found for logger 
(org.apache.asterix.installer.command.AbstractCommand).
00:42:59.099 log4j:WARN Please initialize the log4j system properly.
00:43:03.318 SLF4J: Class path contains multiple SLF4J bindings.
00:43:03.318 SLF4J: Found binding in 
[jar:file:/home/jenkins/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
00:43:03.319 SLF4J: Found binding in 
[jar:file:/home/jenkins/.m2/repository/org/slf4j/slf4j-jcl/1.6.3/slf4j-jcl-1.6.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
00:43:03.320 SLF4J: See 
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
02:02:43.819 executing script 'Capture Thread Dumps'



On 14 Aug 2016, at 12:05, Till Westmann wrote:

> Out of the last 30 integration tests [1] 15 timed out (and 5 had test 
> failures).
> That seems like a high number and I’m wondering if there could be a 
> common cause.
>
> If anybody has some insight into one of the hangs, please share here, 
> so that we
> can hopefully identify commonalities (if they exist ..).
>
> Cheers,
> Till
>
> [1] 
> https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/

Mime
View raw message