uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Challenger <chall...@gmail.com>
Subject Re: DUCC stuck at WaitingForResources on an Amazon Linux
Date Fri, 14 Nov 2014 12:49:46 GMT
Simon,
     It looks like the problem is the amount of RAM on your machine. 
It's going to be hard to get any meaningful work running on < about 8G.

     Here's what to do to get the test job to run on your 4G machine:
     1.  In the resources folder, edit ducc.properties and change this:
               ducc.jd.host.memory.size=2GB
          to this:
               ducc.jd.host.memory.size=1GB

          This is the amount of RAM that DUCC reserves for itself to 
manage it's "head" processes.

     2.  In the examples/simple folder, edit 1.job and change this:
              process_memory_size            2
          to this:
              process_memory_size            1

          This is the amount of memory in GB that the sample 1.job is 
requesting.

      3.  Stop ducc and restart it so the ducc processes reset the 
jd.host.memory size from the new ducc.properties.

      4.  Rerun 1.job and all should be well.

       Here are the gory details from the RM log, if you're interested.  
In the RM log, I see these lines.

13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
                  Name       Order Active Shares Unused Shares Memory 
(MB) Jobs
--------------------        ----- ------------- ------------- 
----------- ------ ...
.us-west-2.compute.internal     3 2             1        3955 7 [1]

This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by 
the reservation/job "7", and that you have 1GB free.  The reason you 
have only 3GB **usable** is that usually the hardware/opsys will reserve 
a small part of the installed RAM for itself, so the reported RAM is a 
tad smaller.  To avoid overcommitting the system, we use the reported 
value, not the installed value.  Most or all of the jobs here will 
easily overwhelm even the largest machines if we don't do this.

Next,  these lines show the actual schedule the RM is trying to build.
Dormant:
             ID                        JobName       User Class Shares 
Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
    J_________8                     Test_job_1       ducc normal      
0     2       0   2      2     15       15 true         8

Reserved:
             ID                        JobName       User Class Shares 
Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
    R_________7                     Job_Driver     System JobDriver      
1     2       2   0      2      0        0 0         1

This confirms that the DUCC reservation "7" occupies 2G, and that job 
"8" is requesting 2G but is "dormant", i.e. waiting for resources.  
Since there is only 3G available on this machine, job 8 will wait.

Best,
Jim






Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message