airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Marru (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRAVATA-756) Ensure Airavata can renew proxy for long running jobs.
Date Thu, 31 Jan 2013 22:39:12 GMT

     [ https://issues.apache.org/jira/browse/AIRAVATA-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Suresh Marru updated AIRAVATA-756:
----------------------------------

    Affects Version/s: 0.6
        Fix Version/s:     (was: 0.6)
                       0.7
              Summary: Ensure Airavata can renew proxy for long running jobs.  (was: Error
message on Airavata Server and Xbaya, even thought, job ran successfully after 17 hours.)

Thanks Pedro for reporting this issue. Airavata should be capable of renewing the grid proxy
and automatically connect back to the job. I changed the JIRA summary to reflect the same.
 
                
> Ensure Airavata can renew proxy for long running jobs.
> ------------------------------------------------------
>
>                 Key: AIRAVATA-756
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-756
>             Project: Airavata
>          Issue Type: Bug
>          Components: Distribution, GFac, XBaya
>    Affects Versions: 0.6
>         Environment: Mac OS 10.5.8
> Processor: 2 x 2.8GHz Quad-Core Intel Xeon
> Memory 8G 800 Mhz DDR2
> Java 1.6.0_26
>            Reporter: Pedro da Silveira
>            Priority: Minor
>             Fix For: 0.7
>
>
> After I fixed the problem on my local firewall with the suggestion made by Raminderjeet
Singh to let the ports # from from 40,000 to 40,100 open to Airavata Server, I don't  received
the message "Status 0" anymore.
> Although, I am still getting an error message on my Airavata-Server and a red alert on
Xbaya as informing that my job failed, but reality is that my job ran successfully.
> This task took 17 hours to finish with 3 inputs in one application service.
> According to Raminderjeet Singh, if I change the myproxy.life=3600 in airavata-server.properties
to myproxy.life=172800, I won't get this error message anymore.
> ==========================
> Error message on Airavata-Server:
> ==========================
> [INFO] job https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/
have same status: ACTIVE
> [INFO] job https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/
have same status: ACTIVE
> [INFO] job https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/
have same status: ACTIVE
> [INFO] job https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/
have same status: ACTIVE
> [INFO] job https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/
have same status: ACTIVE
> [INFO] job https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/
have same status: ACTIVE
> [INFO] job https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/
have same status: ACTIVE
> [INFO] job https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/
have same status: ACTIVE
> [INFO] job https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/
have same status: ACTIVE
> [INFO] job https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/
have same status: ACTIVE
> [INFO] job https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/
have same status: ACTIVE
> [INFO] Job proxy expired. Trying to renew proxy
> org.globus.gsi.gssapi.GlobusGSSCredentialImpl@453931d9
> [INFO] Proxy file renewed to /tmp/x509up_uogcebb9f81ba-8f59-4fec-b776-331c3f21bb62 for
the user ogce with 3600 lifetime.
> [ERROR] Context passed was NULL.
> java.lang.RuntimeException: Context passed was NULL.
> 	at org.apache.airavata.workflow.tracking.impl.ProvenanceNotifierImpl.sendingFault(ProvenanceNotifierImpl.java:496)
> 	at org.apache.airavata.workflow.tracking.impl.ProvenanceNotifierImpl.sendingFault(ProvenanceNotifierImpl.java:485)
> 	at org.apache.airavata.core.gfac.notification.impl.WorkflowTrackingNotification.executionFail(WorkflowTrackingNotification.java:108)
> 	at org.apache.airavata.core.gfac.notification.impl.DefaultNotifier.executionFail(DefaultNotifier.java:135)
> 	at org.apache.airavata.core.gfac.exception.JobSubmissionFault.sendFaultNotification(JobSubmissionFault.java:52)
> 	at org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:231)
> 	at org.apache.airavata.core.gfac.provider.AbstractProvider.execute(AbstractProvider.java:69)
> 	at org.apache.airavata.core.gfac.services.impl.AbstractSimpleService.execute(AbstractSimpleService.java:118)
> 	at org.apache.airavata.core.gfac.GfacAPI.gridJobSubmit(GfacAPI.java:140)
> 	at org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:256)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.handleWSComponent(WorkflowInterpreter.java:749)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.executeDynamically(WorkflowInterpreter.java:533)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.scheduleDynamically(WorkflowInterpreter.java:218)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:389)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.access$400(WorkflowInterpretorSkeleton.java:87)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton$2.run(WorkflowInterpretorSkeleton.java:382)
> 	at java.lang.Thread.run(Thread.java:680)
> [INFO] 	-----DATA-----
> [INFO] 		lonestar4.tacc.teragrid.org,&( queue = "normal" )( stdout = "/scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/lonestar_application.stdout"
)( count = "72" )( executable = "/scratch/01437/ogce/Vlab/Phonon/executePhonon.sh" )( stderr
= "/scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/lonestar_application.stderr"
)( maxwalltime = "1440" )( hostCount = "6" )( minmemory = "10240" )( project = "TG-STA110014S"
)( jobtype = "mpi" )( environment = ( "inputData" "/scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/inputData"
) ( "outputData" "/scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/outputData"
) )( proxy_timeout = "1" )( arguments = "///scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/inputData/Pwscf_Input"
"///scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/inputData/Cd_PON_sp_LDA.vdb"
"///scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/inputData/Te_PON_LDA.vdb"
"///scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/inputData/Phonon_Input"
)( directory = "/scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3"
)( maxmemory = "15360" )
> [INFO] 	-----END DATA-----
> [ERROR] The connection to the server failed (check host and port) [Caused by: Connection
refused]
> org.apache.airavata.core.gfac.exception.JobSubmissionFault: The connection to the server
failed (check host and port) [Caused by: Connection refused]
> 	at org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:229)
> 	at org.apache.airavata.core.gfac.provider.AbstractProvider.execute(AbstractProvider.java:69)
> 	at org.apache.airavata.core.gfac.services.impl.AbstractSimpleService.execute(AbstractSimpleService.java:118)
> 	at org.apache.airavata.core.gfac.GfacAPI.gridJobSubmit(GfacAPI.java:140)
> 	at org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:256)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.handleWSComponent(WorkflowInterpreter.java:749)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.executeDynamically(WorkflowInterpreter.java:533)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.scheduleDynamically(WorkflowInterpreter.java:218)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:389)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.access$400(WorkflowInterpretorSkeleton.java:87)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton$2.run(WorkflowInterpretorSkeleton.java:382)
> 	at java.lang.Thread.run(Thread.java:680)
> Caused by: org.globus.gram.GramException: The connection to the server failed (check
host and port) [Caused by: Connection refused]
> 	at org.globus.gram.Gram.renew(Gram.java:595)
> 	at org.globus.gram.GramJob.renew(GramJob.java:329)
> 	at org.globus.gram.GramJob.renew(GramJob.java:315)
> 	at org.apache.airavata.core.gfac.provider.utils.JobSubmissionListener.waitFor(JobSubmissionListener.java:72)
> 	at org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:206)
> 	... 11 more
> Exception in thread "Thread-98" org.apache.airavata.workflow.model.exceptions.WorkflowRuntimeException:
org.apache.airavata.workflow.model.exceptions.WorkflowException: The connection to the server
failed (check host and port) [Caused by: Connection refused]
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:392)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.access$400(WorkflowInterpretorSkeleton.java:87)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton$2.run(WorkflowInterpretorSkeleton.java:382)
> 	at java.lang.Thread.run(Thread.java:680)
> Caused by: org.apache.airavata.workflow.model.exceptions.WorkflowException: The connection
to the server failed (check host and port) [Caused by: Connection refused]
> 	at org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:321)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.handleWSComponent(WorkflowInterpreter.java:749)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.executeDynamically(WorkflowInterpreter.java:533)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.scheduleDynamically(WorkflowInterpreter.java:218)
> 	at org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:389)
> 	... 3 more
> Caused by: org.apache.airavata.core.gfac.exception.JobSubmissionFault: The connection
to the server failed (check host and port) [Caused by: Connection refused]
> 	at org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:229)
> 	at org.apache.airavata.core.gfac.provider.AbstractProvider.execute(AbstractProvider.java:69)
> 	at org.apache.airavata.core.gfac.services.impl.AbstractSimpleService.execute(AbstractSimpleService.java:118)
> 	at org.apache.airavata.core.gfac.GfacAPI.gridJobSubmit(GfacAPI.java:140)
> 	at org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:256)
> 	... 7 more
> Caused by: org.globus.gram.GramException: The connection to the server failed (check
host and port) [Caused by: Connection refused]
> 	at org.globus.gram.Gram.renew(Gram.java:595)
> 	at org.globus.gram.GramJob.renew(GramJob.java:329)
> 	at org.globus.gram.GramJob.renew(GramJob.java:315)
> 	at org.apache.airavata.core.gfac.provider.utils.JobSubmissionListener.waitFor(JobSubmissionListener.java:72)
> 	at org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:206)
> 	... 11 more
> ^[[B

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message