airavata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eroma (Jira)" <j...@apache.org>
Subject [jira] [Updated] (AIRAVATA-2941) Experiments fail to submit jobs to HPC cluster queues due to queue reaching the max job limit per user.
Date Thu, 26 Mar 2020 18:45:00 GMT

     [ https://issues.apache.org/jira/browse/AIRAVATA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eroma updated AIRAVATA-2941:
----------------------------
    Labels: gsoc2020  (was: )

> Experiments fail to submit jobs to HPC cluster queues due to queue reaching the max job
limit per user.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: AIRAVATA-2941
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2941
>             Project: Airavata
>          Issue Type: Bug
>          Components: GFac, helix implementation
>    Affects Versions: 0.18
>         Environment: https://staging.ultrascan.scigap.org & https://ultrascan.scigap.org/

>            Reporter: Eroma
>            Assignee: Shameera
>            Priority: Major
>              Labels: gsoc2020
>             Fix For: 0.18
>
>
> Currently experiments fail when
>  # HPC queue reaches the max job number for the queue.
>  # When the job submission fails and HPC sent job submission response [1]airavata tags
the experiment as FAILED.
>  # The only option for gateway user is to submit the experiment again.
> Fix required is to Airavata to have internal queues or a way to manage such experiments
until the HPC queue is available for jobs and not to FAIL the experiment.
>  
> [1]
> This example os from stampede2
> ----------------------------------------------------------------- Welcome to the Stampede2
Supercomputer ----------------------------------------------------------------- No reservation
for this job --> Verifying valid submit host (login3)...OK --> Verifying valid jobname...OK
--> Enforcing max jobs per user...FAILED [*] Too many simultaneous jobs in queue. -->
Max job limits for us3 = 50 jobs
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message