manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aeham Abushwashi <aeham.abushwa...@exonar.com>
Subject Re: Scaling in MCF
Date Thu, 20 Nov 2014 14:24:44 GMT
Hi Karl,

A couple of initial observations from a fresh install - 7 jobs, 4 nodes in
a single cluster, # jobqueue records < 1M,

1. When a job is started or stopped, a particular SQL query, which I hadn't
noticed in previous versions, pops up again and again  and seems to take a
few minutes each time (judging by the query_start column in the
pg_stat_activity table):

SELECT COUNT(t2.x) AS doccount FROM (SELECT 'x' AS x FROM jobqueue WHERE
jobid=$1 AND  (status=$2 OR status=$3 OR status=$4 OR status=$5 OR
status=$6 OR status=$7) LIMIT 500001) t2

The query continues to be re-executed after the job is marked as inactive.

The closest match to this query that I could find in code is the one fired
by JobManager#getRunningJobs but the number of of terms in the WHERE clause
is different


2. As I was stopping and restarting a bunch of jobs concurrently, SQL
deadlocks ensued and were reported on 3 of the 4 MCF nodes in the cluster.
All of the exceptions reference the method JobQueue#clearDocPriorities.
Here's snippets of log files from the 4 nodes:

**NODE #1**

 INFO 2014-11-19 17:04:25,851 (Job notification thread) - Found job
1416410450171 in need of notification
 INFO 2014-11-19 17:06:08,438 (qtp720239731-20) - Manually aborting job
1416411618209
 INFO 2014-11-19 17:06:08,447 (qtp720239731-20) - Job 1416411618209 abort
signal successfully sent
 INFO 2014-11-19 17:06:11,335 (qtp720239731-18) - Manually aborting job
1416411742909
 INFO 2014-11-19 17:06:11,351 (qtp720239731-18) - Job 1416411742909 abort
signal successfully sent
 INFO 2014-11-19 17:06:13,689 (qtp720239731-17) - Manually aborting job
1416411915906
 INFO 2014-11-19 17:06:13,704 (qtp720239731-17) - Job 1416411915906 abort
signal successfully sent
 INFO 2014-11-19 17:06:15,860 (qtp720239731-16) - Manually aborting job
1416412103264
 INFO 2014-11-19 17:06:15,886 (qtp720239731-16) - Job 1416412103264 abort
signal successfully sent
 INFO 2014-11-19 17:06:18,076 (qtp720239731-19) - Manually aborting job
1416411677979
 INFO 2014-11-19 17:06:18,118 (qtp720239731-19) - Job 1416411677979 abort
signal successfully sent
ERROR 2014-11-19 17:06:24,765 (Job reset thread) - Exception tossed: ERROR:
deadlock detected
  Detail: Process 17695 waits for ShareLock on transaction 572361982;
blocked by process 16640.
Process 16640 waits for ShareLock on transaction 572361975; blocked by
process 17695.
  Hint: See server log for query details.
org.apache.manifoldcf.core.interfaces.ManifoldCFException: ERROR: deadlock
detected
  Detail: Process 17695 waits for ShareLock on transaction 572361982;
blocked by process 16640.
Process 16640 waits for ShareLock on transaction 572361975; blocked by
process 17695.
  Hint: See server log for query details.
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:628)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:660)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performUpdate(DBInterfacePostgreSQL.java:254)
        at
org.apache.manifoldcf.core.database.BaseTable.performUpdate(BaseTable.java:80)
        at
org.apache.manifoldcf.crawler.jobs.JobQueue.clearDocPriorities(JobQueue.java:1046)
        at
org.apache.manifoldcf.crawler.jobs.JobManager.finishJobStops(JobManager.java:8170)
        at
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:69)
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 17695 waits for ShareLock on transaction 572361982;
blocked by process 16640.
Process 16640 waits for ShareLock on transaction 572361975; blocked by
process 17695.
  Hint: See server log for query details.
        at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
        at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
        at
org.apache.manifoldcf.core.database.Database.execute(Database.java:894)
        at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
ERROR 2014-11-19 17:06:43,054 (Job reset thread) - Exception tossed: ERROR:
deadlock detected
  Detail: Process 16640 waits for ShareLock on transaction 572362009;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572362011; blocked by
process 16640.
  Hint: See server log for query details.
org.apache.manifoldcf.core.interfaces.ManifoldCFException: ERROR: deadlock
detected
  Detail: Process 16640 waits for ShareLock on transaction 572362009;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572362011; blocked by
process 16640.
  Hint: See server log for query details.
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:628)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:660)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performUpdate(DBInterfacePostgreSQL.java:254)
        at
org.apache.manifoldcf.core.database.BaseTable.performUpdate(BaseTable.java:80)
        at
org.apache.manifoldcf.crawler.jobs.JobQueue.clearDocPriorities(JobQueue.java:1046)
        at
org.apache.manifoldcf.crawler.jobs.JobManager.finishJobStops(JobManager.java:8170)
        at
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:69)
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 16640 waits for ShareLock on transaction 572362009;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572362011; blocked by
process 16640.
  Hint: See server log for query details.
        at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
        at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
        at
org.apache.manifoldcf.core.database.Database.execute(Database.java:894)
        at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)



**NODE #2**

 INFO 2014-11-19 17:04:22,524 (Job reset thread) - Stopped job 1416410450171


**NODE #3**

INFO 2014-11-19 17:06:21,994 (Job notification thread) - Found job
1416411618209 in need of notification
 INFO 2014-11-19 17:06:37,105 (Job reset thread) - Stopped job 1416411742909
 INFO 2014-11-19 17:06:38,234 (Job reset thread) - Stopped job 1416411915906
ERROR 2014-11-19 17:06:39,826 (Job reset thread) - Exception tossed: ERROR:
deadlock detected
  Detail: Process 16086 waits for ShareLock on transaction 572361994;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572362003; blocked by
process 16086.
  Hint: See server log for query details.
org.apache.manifoldcf.core.interfaces.ManifoldCFException: ERROR: deadlock
detected
  Detail: Process 16086 waits for ShareLock on transaction 572361994;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572362003; blocked by
process 16086.
  Hint: See server log for query details.
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:628)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:660)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performUpdate(DBInterfacePostgreSQL.java:254)
        at
org.apache.manifoldcf.core.database.BaseTable.performUpdate(BaseTable.java:80)
        at
org.apache.manifoldcf.crawler.jobs.JobQueue.clearDocPriorities(JobQueue.java:1046)
        at
org.apache.manifoldcf.crawler.jobs.JobManager.finishJobStops(JobManager.java:8170)
        at
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:69)
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 16086 waits for ShareLock on transaction 572361994;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572362003; blocked by
process 16086.
  Hint: See server log for query details.
        at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
        at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
        at
org.apache.manifoldcf.core.database.Database.execute(Database.java:894)
        at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
 INFO 2014-11-19 17:06:42,043 (Job notification thread) - Found job
1416411742909 in need of notification
 INFO 2014-11-19 17:06:42,044 (Job notification thread) - Found job
1416411915906 in need of notification
ERROR 2014-11-19 17:06:44,690 (Job reset thread) - Exception tossed: ERROR:
deadlock detected
  Detail: Process 16086 waits for ShareLock on transaction 572362024;
blocked by process 17903.
Process 17903 waits for ShareLock on transaction 572362026; blocked by
process 16086.
  Hint: See server log for query details.
org.apache.manifoldcf.core.interfaces.ManifoldCFException: ERROR: deadlock
detected
  Detail: Process 16086 waits for ShareLock on transaction 572362024;
blocked by process 17903.
Process 17903 waits for ShareLock on transaction 572362026; blocked by
process 16086.
  Hint: See server log for query details.
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:628)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:660)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performUpdate(DBInterfacePostgreSQL.java:254)
        at
org.apache.manifoldcf.core.database.BaseTable.performUpdate(BaseTable.java:80)
        at
org.apache.manifoldcf.crawler.jobs.JobQueue.clearDocPriorities(JobQueue.java:1046)
        at
org.apache.manifoldcf.crawler.jobs.JobManager.finishJobStops(JobManager.java:8170)
        at
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:69)
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 16086 waits for ShareLock on transaction 572362024;
blocked by process 17903.
Process 17903 waits for ShareLock on transaction 572362026; blocked by
process 16086.
  Hint: See server log for query details.
        at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
        at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
        at
org.apache.manifoldcf.core.database.Database.execute(Database.java:894)
        at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
 INFO 2014-11-19 17:06:52,274 (Job notification thread) - Found job
1416412103264 in need of notification
 INFO 2014-11-19 17:07:02,336 (Job notification thread) - Found job
1416411677979 in need of notification



**NODE #4**

 INFO 2014-11-19 17:06:20,875 (Job reset thread) - Stopped job 1416411618209
ERROR 2014-11-19 17:06:22,873 (Job reset thread) - Exception tossed: ERROR:
deadlock detected
  Detail: Process 17903 waits for ShareLock on transaction 572361703;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572361970; blocked by
process 17903.
  Hint: See server log for query details.
org.apache.manifoldcf.core.interfaces.ManifoldCFException: ERROR: deadlock
detected
  Detail: Process 17903 waits for ShareLock on transaction 572361703;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572361970; blocked by
process 17903.
  Hint: See server log for query details.
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:628)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:660)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performUpdate(DBInterfacePostgreSQL.java:254)
        at
org.apache.manifoldcf.core.database.BaseTable.performUpdate(BaseTable.java:80)
        at
org.apache.manifoldcf.crawler.jobs.JobQueue.clearDocPriorities(JobQueue.java:1046)
        at
org.apache.manifoldcf.crawler.jobs.JobManager.finishJobStops(JobManager.java:8170)
        at
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:69)
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 17903 waits for ShareLock on transaction 572361703;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572361970; blocked by
process 17903.
  Hint: See server log for query details.
        at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
        at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
        at
org.apache.manifoldcf.core.database.Database.execute(Database.java:894)
        at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
ERROR 2014-11-19 17:06:41,599 (Job reset thread) - Exception tossed: ERROR:
deadlock detected
  Detail: Process 17903 waits for ShareLock on transaction 572362009;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572362004; blocked by
process 17903.
  Hint: See server log for query details.
org.apache.manifoldcf.core.interfaces.ManifoldCFException: ERROR: deadlock
detected
  Detail: Process 17903 waits for ShareLock on transaction 572362009;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572362004; blocked by
process 17903.
  Hint: See server log for query details.
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:628)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:660)
        at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performUpdate(DBInterfacePostgreSQL.java:254)
        at
org.apache.manifoldcf.core.database.BaseTable.performUpdate(BaseTable.java:80)
        at
org.apache.manifoldcf.crawler.jobs.JobQueue.clearDocPriorities(JobQueue.java:1046)
        at
org.apache.manifoldcf.crawler.jobs.JobManager.finishJobStops(JobManager.java:8170)
        at
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:69)
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 17903 waits for ShareLock on transaction 572362009;
blocked by process 17172.
Process 17172 waits for ShareLock on transaction 572362004; blocked by
process 17903.
  Hint: See server log for query details.
        at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
        at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
        at
org.apache.manifoldcf.core.database.Database.execute(Database.java:894)
        at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
 INFO 2014-11-19 17:06:50,652 (Job reset thread) - Stopped job 1416412103264
 INFO 2014-11-19 17:06:56,152 (Job reset thread) - Stopped job 1416411677979

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message