manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aeham Abushwashi <aeham.abushwa...@exonar.com>
Subject Re: Scaling in MCF
Date Mon, 15 Dec 2014 10:29:23 GMT
Below is a cutdown version of the pg_stat_activity dump...

Could the the scope of the docpriority update be limited somehow (based on
needpriority?) to only those rows that need it? If a bunch of jobs are
started back to back (which I would say is a reasonably common use case
especially for continuous crawls), there will be a huge amount of repeated,
and therefore redundant, docpriority updates. Granted that the choice of
field for limiting the update scope may require an additional sql index.


          xact_start           |          query_start          |
state_change          | waiting | state
|                                                         query
-------------------------------+-------------------------------+-------------------------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------
 2014-12-14 23:51:29.440873+00 | 2014-12-14 23:51:29.44226+00  | 2014-12-14
23:51:29.44226+00  | t       | active | UPDATE jobqueue SET
docpriority=$1,needpriority=$2 WHERE docpriority<$3
 2014-12-15 00:09:02.179227+00 | 2014-12-15 00:09:15.161376+00 | 2014-12-15
00:09:15.161376+00 | t       | active | UPDATE jobqueue SET
status=$1,processid=$2 WHERE id=$3
                               | 2014-12-15 00:16:51.936374+00 | 2014-12-15
00:16:51.936374+00 | f       | idle   | SELECT id FROM jobs WHERE status=$1
                               | 2014-12-15 00:16:52.176358+00 | 2014-12-15
00:16:52.176402+00 | f       | idle   | SELECT * FROM agents
 2014-12-15 00:03:43.584173+00 | 2014-12-15 00:03:43.593023+00 | 2014-12-15
00:03:43.593023+00 | t       | active | UPDATE jobqueue SET
docpriority=$1,needpriority=$2 WHERE docpriority<$3
                               | 2014-12-15 00:16:48.157249+00 | 2014-12-15
00:16:48.157249+00 | f       | idle   | SELECT * FROM agents
 2014-12-15 00:16:54.550487+00 | 2014-12-15 00:16:54.550776+00 | 2014-12-15
00:16:54.550777+00 | f       | active | SELECT id,dochash,docid,jobid FROM
jobqueue WHERE needpriority=$1 LIMIT 1000
 2014-12-15 00:09:02.097583+00 | 2014-12-15 00:09:02.107445+00 | 2014-12-15
00:09:02.107445+00 | f       | active | UPDATE jobqueue SET
docpriority=$1,needpriority=$2 WHERE docpriority<$3
 2014-12-15 00:09:03.795408+00 | 2014-12-15 00:09:03.870265+00 | 2014-12-15
00:09:03.870266+00 | t       | active | SELECT id,status,checktime FROM
jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
 2014-12-15 00:13:12.2401+00   | 2014-12-15 00:13:12.254646+00 | 2014-12-15
00:13:12.254647+00 | t       | active | SELECT id,status,checktime FROM
jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
                               | 2014-12-15 00:16:55.490487+00 | 2014-12-15
00:16:55.490511+00 | f       | idle   | SELECT id FROM jobs WHERE status=$1
 2014-12-15 00:07:55.403813+00 | 2014-12-15 00:07:55.403813+00 | 2014-12-15
00:07:55.403813+00 | f       | active | autovacuum: VACUUM public.jobqueue
 2014-12-15 00:16:56.690037+00 | 2014-12-15 00:16:56.690037+00 | 2014-12-15
00:16:56.690037+00 | f       | active | SELECT * FROM pg_stat_activity
WHERE datname = 'crawlerperf' AND query <> 'COMMIT' ORDER BY client_addr,
query_start;

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message