uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lou DeGenaro <lou.degen...@gmail.com>
Subject Re: DUCC stuck at WaitingForResources on an Amazon Linux
Date Fri, 14 Nov 2014 15:38:03 GMT
Simon,

Congratulations!  You found a bug in DUCC's Web Server.  It was incorrectly
rounding up when reporting the number of shares for a machine.  This issue
is addressed by Jira 4104 <https://issues.apache.org/jira/browse/UIMA-4104>.

Lou.

On Fri, Nov 14, 2014 at 7:49 AM, Jim Challenger <challngr@gmail.com> wrote:

> Simon,
>     It looks like the problem is the amount of RAM on your machine. It's
> going to be hard to get any meaningful work running on < about 8G.
>
>     Here's what to do to get the test job to run on your 4G machine:
>     1.  In the resources folder, edit ducc.properties and change this:
>               ducc.jd.host.memory.size=2GB
>          to this:
>               ducc.jd.host.memory.size=1GB
>
>          This is the amount of RAM that DUCC reserves for itself to manage
> it's "head" processes.
>
>     2.  In the examples/simple folder, edit 1.job and change this:
>              process_memory_size            2
>          to this:
>              process_memory_size            1
>
>          This is the amount of memory in GB that the sample 1.job is
> requesting.
>
>      3.  Stop ducc and restart it so the ducc processes reset the
> jd.host.memory size from the new ducc.properties.
>
>      4.  Rerun 1.job and all should be well.
>
>       Here are the gory details from the RM log, if you're interested.  In
> the RM log, I see these lines.
>
> 13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
>                  Name       Order Active Shares Unused Shares Memory (MB)
> Jobs
> --------------------        ----- ------------- ------------- -----------
> ------ ...
> .us-west-2.compute.internal     3 2             1        3955 7 [1]
>
> This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by
> the reservation/job "7", and that you have 1GB free.  The reason you have
> only 3GB **usable** is that usually the hardware/opsys will reserve a small
> part of the installed RAM for itself, so the reported RAM is a tad
> smaller.  To avoid overcommitting the system, we use the reported value,
> not the installed value.  Most or all of the jobs here will easily
> overwhelm even the largest machines if we don't do this.
>
> Next,  these lines show the actual schedule the RM is trying to build.
> Dormant:
>             ID                        JobName       User Class Shares
> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
>    J_________8                     Test_job_1       ducc normal      0
>  2       0   2      2     15       15 true         8
>
> Reserved:
>             ID                        JobName       User Class Shares
> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
>    R_________7                     Job_Driver     System JobDriver      1
>    2       2   0      2      0        0 0         1
>
> This confirms that the DUCC reservation "7" occupies 2G, and that job "8"
> is requesting 2G but is "dormant", i.e. waiting for resources.  Since there
> is only 3G available on this machine, job 8 will wait.
>
> Best,
> Jim
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message