uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Hafner <reactorm...@gmail.com>
Subject Re: DUCC stuck at WaitingForResources on an Amazon Linux
Date Thu, 13 Nov 2014 22:07:22 GMT
I can't find anything on first glance. Maybe the memory?

13 Nov 2014 22:04:14,907  INFO RM.ResourceManagerComponent -
onJobManagerStateUpdate     N/A -------> OR state arrives
13 Nov 2014 22:04:14,908  INFO RM.JobManagerConverter - jobUpdate
 8 tot: 15 WaitingForResources -> WaitingForResources compl: 0 err: 0
rem: 15 mean: NaN
13 Nov 2014 22:04:14,908  INFO RM.ResourceManagerComponent -
runScheduler     N/A -------- 30 ------- Entering scheduling loop
--------------------
13 Nov 2014 22:04:14,908  INFO RM.Scheduler - nodeArrives     N/A
Total arrivals: 9
13 Nov 2014 22:04:14,908  INFO RM.Scheduler - schedule     N/A
Scheduling 0  new jobs.  Existing jobs: 2
13 Nov 2014 22:04:14,908  INFO RM.Scheduler - schedule     N/A Run
scheduler 0 with top-level nodepool --default--
13 Nov 2014 22:04:14,908  INFO RM.RmJob - getPrjCap       8 ducc
Cannot predict cap: init_wait true || time_per_item NaN
13 Nov 2014 22:04:14,908  INFO RM.RmJob - initJobCap       8 ducc O 2
Base cap: 8 Expected future cap: 2147483647 potential cap 8 actual cap
1
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler - schedule     N/A
Machine occupancy before schedule
13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
================================== Query Machines Nodepool:
--default-- =========================
13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
                 Name Order Active Shares Unused Shares Memory (MB) Jobs
-------------------- ----- ------------- ------------- ----------- ------ ...
.us-west-2.compute.internal     3             2             1        3955 7 [1]

13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
================================== End Query Machines Nodepool:
--default-- ======================
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler -
howMuchFixedShare     N/A No jobs to schedule in class  fixed
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler -
howMuchFixedShare     N/A No jobs to schedule in class  debug
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler -
howMuchFixedShare     N/A Scheduling jobs in class: JobDriver  7
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler -
howMuchFixedShare       7 [stable] requested 1 assigned 1 processes,
2 QS
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class urgent
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  urgent
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class high
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  high
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class standalone
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  standalone
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class weekly
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  weekly
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class normal
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
      8 Scheduling job in class  normal : J_________8
   Test_job_1       ducc     normal      0     2       0   2      2
 15       15     true         8
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class low
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  low
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class background
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  background
13 Nov 2014 22:04:14,924 DEBUG RM.NodepoolScheduler - countClassShares
    N/A Counting for nodepool --default--
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares RmCounter Start
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares maxorder =  3
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares entity_names =  normal
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares weights      =  100
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares wantedby.normal =    1   0
 1   0
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares vmachines =   0   1   0   0
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares RmCounter End
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares Final apportionment:
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares       normal gbo  0   0   0
  0
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares vshares   0   1   0   0
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares nshares   0   1   0   0
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class urgent
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  urgent
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class high
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  high
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class standalone
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  standalone
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class weekly
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  weekly
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class normal
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
     8 Scheduling job in class  normal : 0 shares given, order 2
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class low
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  low
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class background
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  background
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler -
traverseNodepoolsForExpansion     N/A --- stop_here_dx 8
13 Nov 2014 22:04:14,926  INFO RM.NodePool - doExpansion     N/A NP:
--default-- Expansions in this order: 8:notfound
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
N/A  --default-- Counted Current Needed Order
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
 8  --default--       0       0      0     2
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
N/A  --default-- Counted Current Needed Order
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
 7  --default--       1       1      0     2
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler - doEvictions
N/A --default-- NeededByOrder before any eviction: [0, 0, 0, 0]
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler -
detectFragmentation     N/A vMachines:   0   1   0   0
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler -
detectFragmentation     N/A Nodepools:--default--
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler -
detectFragmentation     N/A     Nodepool       User PureFS  NSh
Counted Needed  O Class: normal
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler -
detectFragmentation       8  --default--       ducc      0    0
0      0  2
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler -
detectFragmentation     N/A     Nodepool       User PureFS  NSh
Counted Needed  O Class: JobDriver
13 Nov 2014 22:04:14,928  INFO RM.NodepoolScheduler -
insureFullEviction     N/A No needy jobs, defragmentation bypassed.
13 Nov 2014 22:04:14,934  INFO RM.Scheduler - schedule     N/A
--------------- Scheduler returns ---------------
13 Nov 2014 22:04:14,934  INFO RM.Scheduler - schedule     N/A
 Expanded:
<none>

Shrunken:
   <none>

Stable:
   <none>

Dormant:
            ID                        JobName       User      Class
Shares Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
   J_________8                     Test_job_1       ducc     normal
  0     2       0   2      2     15       15     true         8

Reserved:
            ID                        JobName       User      Class
Shares Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
   R_________7                     Job_Driver     System  JobDriver
  1     2       2   0      2      0        0        0         1


13 Nov 2014 22:04:14,934  INFO RM.Scheduler - schedule     N/A
------------------------------------------------
13 Nov 2014 22:04:14,934  INFO RM.JobManagerConverter - createState
 N/A Schedule sent to Orchestrator
13 Nov 2014 22:04:14,934  INFO RM.JobManagerConverter - createState     N/A
Reservation 7
Existing[1]: .us-west-2.compute.internal.1^0
Additions[0]:
Removals[0]:
Job 8
Existing[0]:
Additions[0]:
Removals[0]:

13 Nov 2014 22:04:14,946  INFO RM.ResourceManagerComponent -
runScheduler     N/A -------- 30 ------- Scheduling loop returns
--------------------

2014-11-13 12:12 GMT-06:00 Eddie Epstein <eaepstein@gmail.com>:
> Simon,
>
> The DUCC resource manager logs into rm.log. Did you look there for reasons
> the resources are not being allocated?
>
> Eddie
>
> On Wed, Nov 12, 2014 at 4:07 PM, Simon Hafner <reactormonk@gmail.com> wrote:
>
>> 4 shares total, 2 in use.
>>
>> 2014-11-12 5:06 GMT-06:00 Lou DeGenaro <lou.degenaro@gmail.com>:
>> > Try looking at your DUCC's web server.  On the System -> Machines page
>> > do you see any shares not inuse?
>> >
>> > Lou.
>> >
>> > On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner <reactormonk@gmail.com>
>> wrote:
>> >> I've set up DUCC according to
>> >> https://cwiki.apache.org/confluence/display/UIMA/DUCC
>> >>
>> >>     ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
>> >>
>> >> the job is stuck at WaitingForResources.
>> >>
>> >> 12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
>> >> process     N/A ... Agent Collecting User Processes
>> >> 12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
>> >> copyAllUserReservations     N/A +++++++++++ Copying User Reservations
>> >> - List Size:0
>> >> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
>> >>    N/A ********** User Process Map Size After
>> >> copyAllUserReservations:0
>> >> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
>> >>    N/A ********** User Process Map Size After
>> >> copyAllUserRougeProcesses:0
>> >> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
>>    N/A
>> >> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
>> >>    N/A
>> ******************************************************************************
>> >> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
>> >> process     N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
>> >> Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
>> >> Low Swap Threshold Defined in ducc.properties:0
>> >> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
>> >> reportIncomingStateForThisNode     N/A Received OR Sequence:699 Thread
>> >> ID:13
>> >> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
>> >> reportIncomingStateForThisNode     N/A
>> >> JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
>> >> State:Allocated isDeallocated:false
>> >> 12 Nov 2014 10:37:33,303  INFO Agent.NodeAgent - setReservations
>> >> N/A +++++++++++ Copied User Reservations - List Size:0
>> >> 12 Nov 2014 10:37:33,405  INFO
>> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
>> >>  N/A PID:8168 Swap Usage:0
>> >> 12 Nov 2014 10:37:33,913  INFO
>> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
>> >> collectProcessCurrentCPU     N/A 0.0 == CPUTIME:0.0
>> >> 12 Nov 2014 10:37:33,913  INFO
>> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process     N/A
>> >> ----------- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
>> >> Usage Allowed:-108574720 Time to Collect Swap Usage:0
>> >>
>> >> I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
>> >> Linux (looks centos based).
>> >>
>> >> To install maven (not in the repos)
>> >>
>> >> #! /bin/bash
>> >>
>> >> TEMPORARY_DIRECTORY="$(mktemp -d)"
>> >> DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"
>> >>
>> >> echo 'Downloading Maven to: ' "$DOWNLOAD_TO"
>> >>
>> >> wget -O "$DOWNLOAD_TO"
>> >>
>> http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
>> >>
>> >> echo 'Extracting Maven'
>> >> tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
>> >> rm $DOWNLOAD_TO
>> >>
>> >> echo 'Configuring Envrionment'
>> >>
>> >> mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
>> >> echo -e 'export M2_HOME=/usr/local/maven\nexport
>> >> PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
>> >> source /etc/profile.d/maven.sh
>> >>
>> >> echo 'The maven version: ' `mvn -version` ' has been installed.'
>> >> echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
>> >> echo 'Removing the temporary directory...'
>> >> rm -r "$TEMPORARY_DIRECTORY"
>> >> echo 'Your Maven Installation is Complete.'
>>

Mime
View raw message