manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aeham Abushwashi <aeham.abushwa...@exonar.com>
Subject Re: (Continuous) crawl performance
Date Fri, 31 Oct 2014 14:56:48 GMT
​Hi Karl,

Yes, the use case is that of a few jobs (some of which are inactive) with a
large number of documents in a 'P' or 'G' status.

Updating the jobqueue table stats with ANALYZE doesn't help I'm afraid. The
sub-query still uses the less-optimal index and generates a large number of
redundant rows that the subsequent Nested Loop Semi Join spends all of its
time culling.

There's some good news though. Changing query #1 to use an explicit JOIN
between jobqueue (t0) and jobs (t1) has made it significantly faster. Fom
12 secs all the way down to 1 sec.

This is the original query and its execution plan:

EXPLAIN ANALYZE SELECT
t0.id,t0.docpriority,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
FROM jobqueue t0  WHERE  (t0.status='P' OR t0.status='G') AND
t0.checkaction='R' AND t0.checktime<= 1414683318802 AND EXISTS(SELECT 'x'
FROM jobs t1 WHERE  (t1.status='A' OR t1.status='a') AND t1.id=t0.jobid AND
t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND
t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events
t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority
ASC LIMIT 280;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..95017.41 rows=280 width=177) (actual
time=12395.701..12395.701 rows=0 loops=1)
   ->  Nested Loop Anti Join  (cost=0.00..2641483.78 rows=7784 width=177)
(actual time=12395.699..12395.699 rows=0 loops=1)
         Join Filter: ((t2.jobid <> t0.jobid) AND ((t2.dochash)::text =
(t0.dochash)::text))
         ->  Nested Loop Anti Join  (cost=0.00..2641310.77 rows=7784
width=177) (actual time=12395.697..12395.697 rows=0 loops=1)
               ->  Nested Loop Semi Join  (cost=0.00..2634519.48 rows=7784
width=177) (actual time=12395.696..12395.696 rows=0 loops=1)
                     ->  Index Scan using i1392985450172 on jobqueue t0
(cost=0.00..2609016.45 rows=89520 width=177) (actual time=668.127..3970.495
rows=3346768 loops=1)
                           Index Cond: ((checkaction = 'R'::bpchar) AND
(checktime <= 1414683318802::bigint))
                           Filter: ((status = 'P'::bpchar) OR (status =
'G'::bpchar))
                     ->  Index Scan using jobs_pkey on jobs t1
(cost=0.00..0.28 rows=1 width=8) (actual time=0.002..0.002 rows=0
loops=3346768)
                           Index Cond: (id = t0.jobid)
                           Filter: ((priority = 5) AND ((status =
'A'::bpchar) OR (status = 'a'::bpchar)))
                           Rows Removed by Filter: 1
               ->  Nested Loop  (cost=0.00..0.86 rows=1 width=8) (never
executed)
                     ->  Index Scan using i1392985450178 on prereqevents
t3  (cost=0.00..0.27 rows=1 width=524) (never executed)
                           Index Cond: (t0.id = owner)
                     ->  Index Only Scan using events_pkey on events t4
(cost=0.00..0.58 rows=1 width=516) (never executed)
                           Index Cond: (name = (t3.eventname)::text)
                           Heap Fetches: 0
         ->  Materialize  (cost=0.00..36.79 rows=1 width=49) (never
executed)
               ->  Index Scan using i1392985450173 on jobqueue t2
(cost=0.00..36.78 rows=1 width=49) (never executed)
                     Index Cond: (status = ANY ('{A,F,a,f,D,d}'::bpchar[]))
Total runtime: 12395.884 ms



and this is the modified query and its execution plan:

EXPLAIN ANALYZE SELECT
t0.id,t0.docpriority,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
FROM jobqueue t0  JOIN jobs t1 ON t1.id = t0.jobid WHERE  (t0.status='P' OR
t0.status='G') AND t0.checkaction='R' AND t0.checktime<= 1414683318802 AND
(t1.status = 'A' OR t1.status='a')  AND t1.priority=5 AND NOT EXISTS(SELECT
'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN
('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x'
FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name)
ORDER BY t0.docpriority ASC LIMIT 280;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------
 Limit  (cost=122440.72..122441.42 rows=280 width=177) (actual
time=1010.722..1010.722 rows=0 loops=1)
   ->  Sort  (cost=122440.72..122451.13 rows=4164 width=177) (actual
time=1010.719..1010.719 rows=0 loops=1)
         Sort Key: t0.docpriority
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop Anti Join  (cost=45766.95..122250.65 rows=4164
width=177) (actual time=1010.702..1010.702 rows=0 loops=1)
               Join Filter: ((t2.jobid <> t0.jobid) AND ((t2.dochash)::text
= (t0.dochash)::text))
               ->  Hash Anti Join  (cost=45766.95..122140.99 rows=4164
width=177) (actual time=1010.701..1010.701 rows=0 loops=1)
                     Hash Cond: (t0.id = t3.owner)
                     ->  Nested Loop  (cost=45739.17..122060.64 rows=4164
width=177) (actual time=1010.699..1010.699 rows=0 loops=1)
                           ->  Bitmap Heap Scan on jobs t1
(cost=8.52..15.79 rows=2 width=8) (actual time=0.028..0.031 rows=1 loops=1)
                                 Recheck Cond: ((status = 'A'::bpchar) OR
(status = 'a'::bpchar))
                                 Filter: (priority = 5)
                                 ->  BitmapOr  (cost=8.52..8.52 rows=2
width=0) (actual time=0.021..0.021 rows=0 loops=1)
                                       ->  Bitmap Index Scan on
i1392985450182  (cost=0.00..4.26 rows=1 width=0) (actual time=0.017..0.017
rows=1 loops=1)
                                             Index Cond: (status =
'A'::bpchar)
                                       ->  Bitmap Index Scan on
i1392985450182  (cost=0.00..4.26 rows=1 width=0) (actual time=0.003..0.003
rows=0 loops=1)
                                             Index Cond: (status =
'a'::bpchar)
                           ->  Bitmap Heap Scan on jobqueue t0
(cost=45730.65..60983.50 rows=3892 width=177) (actual
time=1010.661..1010.661 rows=0 loops=1)
                                 Recheck Cond: ((((status = 'P'::bpchar)
AND (checkaction = 'R'::bpchar) AND (checktime <= 1414683318802::bigint))
OR ((status = 'G'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime
<= 1414683318802
::bigint))) AND (jobid = t1.id))
                                 ->  BitmapAnd  (cost=45730.65..45730.65
rows=3982 width=0) (actual time=1009.818..1009.818 rows=0 loops=1)
                                       ->  BitmapOr  (cost=3129.52..3129.52
rows=91575 width=0) (actual time=934.891..934.891 rows=0 loops=1)
                                             ->  Bitmap Index Scan on
i1392985450174  (cost=0.00..1806.76 rows=53658 width=0) (actual
time=513.541..513.541 rows=1946394 loops=1)
                                                   Index Cond: ((status =
'P'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime <=
1414683318802::bigint))
                                             ->  Bitmap Index Scan on
i1392985450174  (cost=0.00..1278.00 rows=37917 width=0) (actual
time=421.346..421.346 rows=1400375 loops=1)
                                                   Index Cond: ((status =
'G'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime <=
1414683318802::bigint))
                                       ->  Bitmap Index Scan on
i1392985450177  (cost=0.00..42599.91 rows=1612579 width=0) (actual
time=5.887..5.887 rows=43401 loops=1)
                                             Index Cond: (jobid = t1.id)
                     ->  Hash  (cost=26.15..26.15 rows=130 width=8) (never
executed)
                           ->  Hash Join  (cost=12.93..26.15 rows=130
width=8) (never executed)
                                 Hash Cond: ((t3.eventname)::text = (t4.name
)::text)
                                 ->  Seq Scan on prereqevents t3
(cost=0.00..11.40 rows=140 width=524) (never executed)
                                 ->  Hash  (cost=11.30..11.30 rows=130
width=516) (never executed)
                                       ->  Seq Scan on events t4
(cost=0.00..11.30 rows=130 width=516) (never executed)
               ->  Materialize  (cost=0.00..36.79 rows=1 width=49) (never
executed)
                     ->  Index Scan using i1392985450173 on jobqueue t2
(cost=0.00..36.78 rows=1 width=49) (never executed)
                           Index Cond: (status = ANY
('{A,F,a,f,D,d}'::bpchar[]))
 Total runtime: 1010.861 ms

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message