airavata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eroma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AIRAVATA-2833) Several experiments failed at various stages of job submission due to connection lost
Date Fri, 15 Jun 2018 15:18:00 GMT
Eroma created AIRAVATA-2833:
-------------------------------

             Summary: Several experiments failed at various stages of job submission due to
connection lost
                 Key: AIRAVATA-2833
                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2833
             Project: Airavata
          Issue Type: Bug
          Components: helix implementation
    Affects Versions: 0.18
         Environment: https://staging.seagrid.org/
            Reporter: Eroma
            Assignee: Dimuthu Upeksha
             Fix For: 0.18


While submitting a batch of jobs, several failed in a single cluster due to connection lost.

Experiment has failed at uploading input file, output transfer and creating archive.tar. Error
in log [1]. Anything we could do here? Try again? resubmit the task?

 

 

Exi ID: 

SLM001-QEspresso-JS:2_d01e50dd-74fe-434a-87b3-e4668b827da5

SLM001-QEspresso-JS:1_b29c6476-8944-4f6d-8946-b2e9f20b2acf

SLM001-QEspresso-JS:0_cd3d980d-017e-4ebe-91f7-85d1157feb94

 

[1]

org.apache.airavata.helix.impl.task.TaskOnFailException: Error Code : cc1c8295-e5ec-44bf-b705-eceddfca3b1a,
Task TASK_b6ea333e-7468-4221-8b87-09050d7d053c failed due to Failed uploading the input file
to /N/SEAGrid_scratch/PROCESS_1694a674-3dd7-4693-868e-b74444fd2b8d/ from local path /tmp/PROCESS_1694a674-3dd7-4693-868e-b74444fd2b8d/temp_inputs/Al.sample1.in,
net.schmizz.sshj.connection.ConnectionException: [CONNECTION_LOST] Did not receive any keep-alive
response for 25 seconds at org.apache.airavata.helix.impl.task.AiravataTask.onFail(AiravataTask.java:102)
at org.apache.airavata.helix.impl.task.staging.InputDataStagingTask.onRun(InputDataStagingTask.java:137)
at org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:311) at org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:90)
at org.apache.helix.task.TaskRunner.run(TaskRunner.java:71) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.airavata.agents.api.AgentException:
net.schmizz.sshj.connection.ConnectionException: [CONNECTION_LOST] Did not receive any keep-alive
response for 25 seconds at org.apache.airavata.helix.adaptor.SSHJAgentAdaptor.copyFileTo(SSHJAgentAdaptor.java:155)
at org.apache.airavata.helix.impl.task.staging.InputDataStagingTask.onRun(InputDataStagingTask.java:119)
... 10 more Caused by: net.schmizz.sshj.connection.ConnectionException: [CONNECTION_LOST]
Did not receive any keep-alive response for 25 seconds at net.schmizz.keepalive.KeepAliveRunner.checkMaxReached(KeepAliveRunner.java:64)
at net.schmizz.keepalive.KeepAliveRunner.doKeepAlive(KeepAliveRunner.java:56) at net.schmizz.keepalive.KeepAlive.run(KeepAlive.java:63)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message