manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aeham Abushwashi <aeham.abushwa...@exonar.com>
Subject Re: Scaling in MCF
Date Mon, 15 Dec 2014 00:40:38 GMT
Hi Karl,

I have some analysis to share wrt job starting performance...

After starting a handful of new jobs, my 3-node mcf cluster (dev_1x and
already populated with ~10M jobqueue records) appeared to have stalled. The
stuffer threads on two nodes were waiting on the stuffer lock. The stuffer
thread on the third node was blocked on the execution of the sql query in
JobQueue#updateActiveRecord.

My guess is that as the stuffer thread picks up items and updates their
status and process id, it can get blocked (indirectly?) by the
repriotisation query issued when a job is started. This causes the stuffer
thread to stall and subsequently no documents are processed by any node in
the cluster.

Below is a dump of the pg_stat_activity table. Note the values of the
'waiting' column (true/false). The second row corresponds to the query
invoked by JobQueue#updateActiveRecord

 datid  |   datname   |  pid  | usesysid | usename  | application_name |
client_addr | client_hostname | client_port |         backend_start
|          xact_start           |          query_start          |
state_change
        | waiting | state
|                                                         query
--------+-------------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-----------------------
--------+---------+--------+-----------------------------------------------------------------------------------------------------------------------
 400109 | crawlerperf | 16001 |    16384 | slurp    |                  |
10.250.0.23 |                 |       52790 | 2014-12-14 23:19:25.909125+00
| 2014-12-14 23:51:29.440873+00 | 2014-12-14 23:51:29.44226+00  |
2014-12-14 23:51:29.44
226+00  | t       | active | UPDATE jobqueue SET
docpriority=$1,needpriority=$2 WHERE docpriority<$3
 400109 | crawlerperf | 17020 |    16384 | slurp    |                  |
10.250.0.23 |                 |       52813 | 2014-12-14 23:44:31.286417+00
| 2014-12-15 00:09:02.179227+00 | 2014-12-15 00:09:15.161376+00 |
2014-12-15 00:09:15.16
1376+00 | t       | active | UPDATE jobqueue SET status=$1,processid=$2
WHERE id=$3
 400109 | crawlerperf | 16744 |    16384 | slurp    |                  |
10.250.0.23 |                 |       52807 | 2014-12-14 23:37:22.181826+00
|                               | 2014-12-15 00:16:51.936374+00 |
2014-12-15 00:16:51.93
6374+00 | f       | idle   | SELECT id FROM jobs WHERE status=$1
 400109 | crawlerperf | 17022 |    16384 | slurp    |                  |
10.250.0.23 |                 |       52815 | 2014-12-14 23:44:31.402114+00
|                               | 2014-12-15 00:16:52.176358+00 |
2014-12-15 00:16:52.17
6402+00 | f       | idle   | SELECT * FROM agents
 400109 | crawlerperf | 14258 |    16384 | slurp    |                  |
10.250.0.33 |                 |       55885 | 2014-12-14 22:43:56.316824+00
| 2014-12-15 00:03:43.584173+00 | 2014-12-15 00:03:43.593023+00 |
2014-12-15 00:03:43.59
3023+00 | t       | active | UPDATE jobqueue SET
docpriority=$1,needpriority=$2 WHERE docpriority<$3
 400109 | crawlerperf | 13832 |    16384 | slurp    |                  |
10.250.0.33 |                 |       55882 | 2014-12-14 22:30:47.752513+00
|                               | 2014-12-15 00:16:48.157249+00 |
2014-12-15 00:16:48.15
7249+00 | f       | idle   | SELECT * FROM agents
 400109 | crawlerperf | 17745 |    16384 | slurp    |                  |
10.250.0.33 |                 |       55901 | 2014-12-14 23:56:01.490378+00
| 2014-12-15 00:16:54.550487+00 | 2014-12-15 00:16:54.550776+00 |
2014-12-15 00:16:54.55
0777+00 | f       | active | SELECT id,dochash,docid,jobid FROM jobqueue
WHERE needpriority=$1 LIMIT 1000
 400109 | crawlerperf | 16992 |    16384 | slurp    |                  |
10.250.0.43 |                 |       51521 | 2014-12-14 23:43:53.198025+00
| 2014-12-15 00:09:02.097583+00 | 2014-12-15 00:09:02.107445+00 |
2014-12-15 00:09:02.10
7445+00 | f       | active | UPDATE jobqueue SET
docpriority=$1,needpriority=$2 WHERE docpriority<$3
 400109 | crawlerperf | 14907 |    16384 | slurp    |                  |
10.250.0.43 |                 |       51462 | 2014-12-14 22:56:20.025004+00
| 2014-12-15 00:09:03.795408+00 | 2014-12-15 00:09:03.870265+00 |
2014-12-15 00:09:03.87
0266+00 | t       | active | SELECT id,status,checktime FROM jobqueue WHERE
dochash=$1 AND jobid=$2 FOR UPDATE
 400109 | crawlerperf | 18028 |    16384 | slurp    |                  |
10.250.0.43 |                 |       51535 | 2014-12-15 00:03:37.741002+00
| 2014-12-15 00:13:12.2401+00   | 2014-12-15 00:13:12.254646+00 |
2014-12-15 00:13:12.25
4647+00 | t       | active | SELECT id,status,checktime FROM jobqueue WHERE
dochash=$1 AND jobid=$2 FOR UPDATE
 400109 | crawlerperf | 15976 |    16384 | slurp    |                  |
10.250.0.43 |                 |       51490 | 2014-12-14 23:18:51.369753+00
|                               | 2014-12-15 00:16:55.490487+00 |
2014-12-15 00:16:55.49
0511+00 | f       | idle   | SELECT id FROM jobs WHERE status=$1
 400109 | crawlerperf | 18175 |       10 | postgres |
|             |                 |             | 2014-12-15
00:07:55.204579+00 | 2014-12-15 00:07:55.403813+00 | 2014-12-15
00:07:55.403813+00 | 2014-12-15 00:07:55.40
3813+00 | f       | active | autovacuum: VACUUM public.jobqueue
 400109 | crawlerperf | 17632 |       10 | postgres | psql
|             |                 |          -1 | 2014-12-14
23:52:55.506248+00 | 2014-12-15 00:16:56.690037+00 | 2014-12-15
00:16:56.690037+00 | 2014-12-15 00:16:56.69
0037+00 | f       | active | SELECT * FROM pg_stat_activity WHERE datname =
'crawlerperf' AND query <> 'COMMIT' ORDER BY client_addr, query_start;
(13 rows)


Cheers,
Aeham

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message