manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Leonard <anthony.leon...@york.ac.uk>
Subject Job hanging on "Starting up" with never ending external query.
Date Mon, 21 Jan 2013 12:57:49 GMT
Hi there,

We have recently started running a nightly job 2AM in ManifoldCF to extract
data from an Oracle repository and populate a Solr index. Most nights this
works fine, but occasionally the job has been hanging at the "Starting up"
phase. We have observed this on our test setup also occasionally. A restart
of ManifoldCF usually solves this.

Using the simple history reports today I looked up all records and sorted
them by the "Time" column, largest first, and found the following:

"Start Time","Activity","Identifier","Result Code","Bytes","Time","Result
Description"
"11-12-2012 05:00:05.941","external query" "... SQL QUERY
...","ERROR","0","1926607529","Interrupted: null"
"01-21-2013 02:00:11.843","external query" "... SQL QUERY
...","ERROR","0","31644956","Interrupted: null"
"01-17-2013 02:00:03.600","external query" "... SQL QUERY
...","ERROR","0","31637594","Interrupted: null"
"12-04-2012 12:12:19.860","external query" "... SQL QUERY
...","OK","0","17511",""
... etc ...

If the Time column is in millis that means the first query was hanging for
22 days! (This was in the period before we went live when our live server
was sitting idle for a while.) The other two occasions it was hanging for
about 8 hours until we arrived to restart the job in the morning. I have
confirmed that the Oracle database we are connecting to was available
throughout these periods. These times are also too long for any network or
database timeouts, which makes me suspect that it's a problem with the
application.

We have the following logging config in properties.xml

  <property name="org.apache.manifoldcf.jobs" value="ALL"/>
  <property name="org.apache.manifoldcf.connectors" value="ALL"/>
  <property name="org.apache.manifoldcf.agents" value="ALL"/>
  <property name="org.apache.manifoldcf.misc" value="ALL"/>

The job failed again last night and when I checked at 10:40 AM this morning
the last few lines of manifoldcf.log were:

DEBUG 2013-01-21 01:59:45,654 (Job start thread) - Checking if job
1352455005553 needs to be started; it was last checked at 1358733575454,
and now it is 1358733585635
DEBUG 2013-01-21 01:59:45,654 (Job start thread) -  No time match found
within interval 1358733575454 to 1358733585635
DEBUG 2013-01-21 01:59:55,805 (Job start thread) - Checking if job
1352455005553 needs to be started; it was last checked at 1358733585636,
and now it is 1358733595662
DEBUG 2013-01-21 01:59:55,805 (Job start thread) -  No time match found
within interval 1358733585636 to 1358733595662
DEBUG 2013-01-21 02:00:05,821 (Job start thread) - Checking if job
1352455005553 needs to be started; it was last checked at 1358733595663,
and now it is 1358733605813
DEBUG 2013-01-21 02:00:05,821 (Job start thread) -  Time match FOUND within
interval 1358733595663 to 1358733605813
DEBUG 2013-01-21 02:00:05,821 (Job start thread) - Job '1352455005553' is
within run window at 1358733605813 ms. (which starts at 1358733600000 ms.)
DEBUG 2013-01-21 02:00:05,830 (Job start thread) - Signalled for job start
for job 1352455005553
DEBUG 2013-01-21 02:00:11,674 (Startup thread) - Marked job 1352455005553
for startup
DEBUG 2013-01-21 02:00:11,843 (Thread-951922) - JDBC: The connect string is
'jdbc:oracle:thin:@//oradwhlive.york.ac.uk:1521/dwhlive.csrv.york'

After that - nothing. Once I restarted manifoldcf this morning the job
magically restarted itself and the following log messages were added where
it had left off:

DEBUG 2013-01-21 10:47:36,852 (Startup thread) - Setting job 1352455005553
back to 'ReadyForStartup' state
DEBUG 2013-01-21 10:48:57,538 (Thread-98) - Resetting due to restart
DEBUG 2013-01-21 10:48:57,958 (Thread-98) - Reset complete
DEBUG 2013-01-21 10:48:58,200 (Startup thread) - Marked job 1352455005553
for startup
DEBUG 2013-01-21 10:48:58,367 (Thread-184) - JDBC: The connect string is
'jdbc:oracle:thin:@//oradwhlive.york.ac.uk:1521/dwhlive.csrv.york'
DEBUG 2013-01-21 10:49:02,071 (Startup thread) - Job 1352455005553 is now
started
... etc ...

So it appears that the application is running fine with a Job start thread
logging nicely every ten seconds until it tries to start this job and hangs
entirely until the system is shutdown - even though it performed that same
tasks perfectly the night before and later the same day.

Can anyone advise on what might be happening here? We are running
ManifoldCF version 1.0.1 on Ubuntu 10.04.

Best wishes,
Anthony.

-- 
Dr Anthony Leonard
System Integrator, Information Directorate
University of York, Heslington, York, UK, YO10 5DD
Tel: +44 (0)1904 434350 http://twitter.com/apbleonard
Times Higher Education University of the Year 2010

Mime
View raw message