airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro da Silveira <pedro...@gmail.com>
Subject Re: [jira] [Commented] (AIRAVATA-756) Ensure Airavata can renew proxy for long running jobs.
Date Tue, 05 Feb 2013 16:49:00 GMT
Hi Suresh,

I agree with you the right approach would be to update the proxy every 3600
seconds, instead of creating a proxy lifetime of very high value.
I think the fact that Airatava-Server is not reading the right value of
myproxy.life is not the main problem, since Airavata-Server apparently is
updating the proxy every 3600 seconds as I checked in the Airavata-Server
log.
I believe the real problem was when the job had finished its execution
cycle (after 15 hours), but an error message appeared as if the GRAM tried
to read the output of the job, but it couldn't established connection to
Lonestar anymore, maybe because the proxy is outdated. The full error
message is presented on the beginning of this JIRA thread. This error
message was the last information printed in the Airavata-Server log,
previously to that message it had printed that the job was active, and
after 3600 seconds the proxy was renewed.

I will try to provide more information, in case I create more JIRA thread
in the future.


On Tue, Feb 5, 2013 at 9:45 AM, Suresh Marru (JIRA) <jira@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/AIRAVATA-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571395#comment-13571395]
>
> Suresh Marru commented on AIRAVATA-756:
> ---------------------------------------
>
> The bug of not honoring the properties file should be fixed. But I would
> argue against increasing the proxy life time, thats defeats the purpose of
> short lived proxy certificates and the philosophy behind GSI. In short
> term, this probably needs a WA, but a better long term fix is to handle
> proxy delegation and renewals for long running jobs.
>
> > Ensure Airavata can renew proxy for long running jobs.
> > ------------------------------------------------------
> >
> >                 Key: AIRAVATA-756
> >                 URL: https://issues.apache.org/jira/browse/AIRAVATA-756
> >             Project: Airavata
> >          Issue Type: Bug
> >          Components: Distribution, GFac, XBaya
> >    Affects Versions: 0.6
> >         Environment: Mac OS 10.5.8
> > Processor: 2 x 2.8GHz Quad-Core Intel Xeon
> > Memory 8G 800 Mhz DDR2
> > Java 1.6.0_26
> >            Reporter: Pedro da Silveira
> >            Priority: Minor
> >             Fix For: 0.7
> >
> >
> > After I fixed the problem on my local firewall with the suggestion made
> by Raminderjeet Singh to let the ports # from from 40,000 to 40,100 open to
> Airavata Server, I don't  received the message "Status 0" anymore.
> > Although, I am still getting an error message on my Airavata-Server and
> a red alert on Xbaya as informing that my job failed, but reality is that
> my job ran successfully.
> > This task took 17 hours to finish with 3 inputs in one application
> service.
> > According to Raminderjeet Singh, if I change the myproxy.life=3600 in
> airavata-server.properties to myproxy.life=172800, I won't get this error
> message anymore.
> > ==========================
> > Error message on Airavata-Server:
> > ==========================
> > [INFO] job
> https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/have
same status: ACTIVE
> > [INFO] job
> https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/have
same status: ACTIVE
> > [INFO] job
> https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/have
same status: ACTIVE
> > [INFO] job
> https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/have
same status: ACTIVE
> > [INFO] job
> https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/have
same status: ACTIVE
> > [INFO] job
> https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/have
same status: ACTIVE
> > [INFO] job
> https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/have
same status: ACTIVE
> > [INFO] job
> https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/have
same status: ACTIVE
> > [INFO] job
> https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/have
same status: ACTIVE
> > [INFO] job
> https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/have
same status: ACTIVE
> > [INFO] job
> https://gridftp1.ls4.tacc.utexas.edu:50393/16289883153825569046/8943296923859945130/have
same status: ACTIVE
> > [INFO] Job proxy expired. Trying to renew proxy
> > org.globus.gsi.gssapi.GlobusGSSCredentialImpl@453931d9
> > [INFO] Proxy file renewed to
> /tmp/x509up_uogcebb9f81ba-8f59-4fec-b776-331c3f21bb62 for the user ogce
> with 3600 lifetime.
> > [ERROR] Context passed was NULL.
> > java.lang.RuntimeException: Context passed was NULL.
> >       at
> org.apache.airavata.workflow.tracking.impl.ProvenanceNotifierImpl.sendingFault(ProvenanceNotifierImpl.java:496)
> >       at
> org.apache.airavata.workflow.tracking.impl.ProvenanceNotifierImpl.sendingFault(ProvenanceNotifierImpl.java:485)
> >       at
> org.apache.airavata.core.gfac.notification.impl.WorkflowTrackingNotification.executionFail(WorkflowTrackingNotification.java:108)
> >       at
> org.apache.airavata.core.gfac.notification.impl.DefaultNotifier.executionFail(DefaultNotifier.java:135)
> >       at
> org.apache.airavata.core.gfac.exception.JobSubmissionFault.sendFaultNotification(JobSubmissionFault.java:52)
> >       at
> org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:231)
> >       at
> org.apache.airavata.core.gfac.provider.AbstractProvider.execute(AbstractProvider.java:69)
> >       at
> org.apache.airavata.core.gfac.services.impl.AbstractSimpleService.execute(AbstractSimpleService.java:118)
> >       at
> org.apache.airavata.core.gfac.GfacAPI.gridJobSubmit(GfacAPI.java:140)
> >       at
> org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:256)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.handleWSComponent(WorkflowInterpreter.java:749)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.executeDynamically(WorkflowInterpreter.java:533)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.scheduleDynamically(WorkflowInterpreter.java:218)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:389)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.access$400(WorkflowInterpretorSkeleton.java:87)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton$2.run(WorkflowInterpretorSkeleton.java:382)
> >       at java.lang.Thread.run(Thread.java:680)
> > [INFO]        -----DATA-----
> > [INFO]                lonestar4.tacc.teragrid.org,&( queue = "normal"
> )( stdout =
> "/scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/lonestar_application.stdout"
> )( count = "72" )( executable =
> "/scratch/01437/ogce/Vlab/Phonon/executePhonon.sh" )( stderr =
> "/scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/lonestar_application.stderr"
> )( maxwalltime = "1440" )( hostCount = "6" )( minmemory = "10240" )(
> project = "TG-STA110014S" )( jobtype = "mpi" )( environment = ( "inputData"
> "/scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/inputData"
> ) ( "outputData"
> "/scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/outputData"
> ) )( proxy_timeout = "1" )( arguments =
> "///scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/inputData/Pwscf_Input"
> "///scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/inputData/Cd_PON_sp_LDA.vdb"
> "///scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/inputData/Te_PON_LDA.vdb"
> "///scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3/inputData/Phonon_Input"
> )( directory =
> "/scratch/01437/ogce/Vlab/Phonon/__p3_14/AppPhononSingle_Wed_Jan_30_20_00_56_CST_2013_78f5e160-e1df-4008-b02b-53edfa6edbd3"
> )( maxmemory = "15360" )
> > [INFO]        -----END DATA-----
> > [ERROR] The connection to the server failed (check host and port)
> [Caused by: Connection refused]
> > org.apache.airavata.core.gfac.exception.JobSubmissionFault: The
> connection to the server failed (check host and port) [Caused by:
> Connection refused]
> >       at
> org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:229)
> >       at
> org.apache.airavata.core.gfac.provider.AbstractProvider.execute(AbstractProvider.java:69)
> >       at
> org.apache.airavata.core.gfac.services.impl.AbstractSimpleService.execute(AbstractSimpleService.java:118)
> >       at
> org.apache.airavata.core.gfac.GfacAPI.gridJobSubmit(GfacAPI.java:140)
> >       at
> org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:256)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.handleWSComponent(WorkflowInterpreter.java:749)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.executeDynamically(WorkflowInterpreter.java:533)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.scheduleDynamically(WorkflowInterpreter.java:218)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:389)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.access$400(WorkflowInterpretorSkeleton.java:87)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton$2.run(WorkflowInterpretorSkeleton.java:382)
> >       at java.lang.Thread.run(Thread.java:680)
> > Caused by: org.globus.gram.GramException: The connection to the server
> failed (check host and port) [Caused by: Connection refused]
> >       at org.globus.gram.Gram.renew(Gram.java:595)
> >       at org.globus.gram.GramJob.renew(GramJob.java:329)
> >       at org.globus.gram.GramJob.renew(GramJob.java:315)
> >       at
> org.apache.airavata.core.gfac.provider.utils.JobSubmissionListener.waitFor(JobSubmissionListener.java:72)
> >       at
> org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:206)
> >       ... 11 more
> > Exception in thread "Thread-98"
> org.apache.airavata.workflow.model.exceptions.WorkflowRuntimeException:
> org.apache.airavata.workflow.model.exceptions.WorkflowException: The
> connection to the server failed (check host and port) [Caused by:
> Connection refused]
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:392)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.access$400(WorkflowInterpretorSkeleton.java:87)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton$2.run(WorkflowInterpretorSkeleton.java:382)
> >       at java.lang.Thread.run(Thread.java:680)
> > Caused by:
> org.apache.airavata.workflow.model.exceptions.WorkflowException: The
> connection to the server failed (check host and port) [Caused by:
> Connection refused]
> >       at
> org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:321)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.handleWSComponent(WorkflowInterpreter.java:749)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.executeDynamically(WorkflowInterpreter.java:533)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpreter.scheduleDynamically(WorkflowInterpreter.java:218)
> >       at
> org.apache.airavata.xbaya.interpretor.WorkflowInterpretorSkeleton.executeWorkflow(WorkflowInterpretorSkeleton.java:389)
> >       ... 3 more
> > Caused by: org.apache.airavata.core.gfac.exception.JobSubmissionFault:
> The connection to the server failed (check host and port) [Caused by:
> Connection refused]
> >       at
> org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:229)
> >       at
> org.apache.airavata.core.gfac.provider.AbstractProvider.execute(AbstractProvider.java:69)
> >       at
> org.apache.airavata.core.gfac.services.impl.AbstractSimpleService.execute(AbstractSimpleService.java:118)
> >       at
> org.apache.airavata.core.gfac.GfacAPI.gridJobSubmit(GfacAPI.java:140)
> >       at
> org.apache.airavata.xbaya.invoker.EmbeddedGFacInvoker.invoke(EmbeddedGFacInvoker.java:256)
> >       ... 7 more
> > Caused by: org.globus.gram.GramException: The connection to the server
> failed (check host and port) [Caused by: Connection refused]
> >       at org.globus.gram.Gram.renew(Gram.java:595)
> >       at org.globus.gram.GramJob.renew(GramJob.java:329)
> >       at org.globus.gram.GramJob.renew(GramJob.java:315)
> >       at
> org.apache.airavata.core.gfac.provider.utils.JobSubmissionListener.waitFor(JobSubmissionListener.java:72)
> >       at
> org.apache.airavata.core.gfac.provider.impl.GramProvider.executeApplication(GramProvider.java:206)
> >       ... 11 more
> > ^[[B
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message