manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aeham Abushwashi <aeham.abushwa...@exonar.com>
Subject Re: (Continuous) crawl performance
Date Mon, 03 Nov 2014 10:18:34 GMT
Hi Karl,

>> It's reading from the correct index, but is nevertheless taking a long
time, possibly because there are a lot of records with status 'P' or 'G'
that belong to jobs that are not in state 'a' or 'A' that it has to go
through before it assembles 240 results.  Can you confirm this picture?

Yes, as you can see from the previous query plans, some time is spent
scanning index i1392985450172 (docpriority, status, checkaction,
checktime), which identifies >3M rows only for them to be (slowly) filtered
out in the subsequent Nested Loop Semi Join step

+++++++++
               ->  Nested Loop Semi Join  (cost=0.00..2634519.48 rows=7784
width=177) (actual time=12395.696..12395.696 rows=0 loops=1)
                     ->  Index Scan using i1392985450172 on jobqueue t0
(cost=0.00..2609016.45 rows=89520 width=177) (actual time=668.127..3970.495
rows=3346768 loops=1)
                           Index Cond: ((checkaction = 'R'::bpchar) AND
(checktime <= 1414683318802::bigint))
                           Filter: ((status = 'P'::bpchar) OR (status =
'G'::bpchar))
                     ->  Index Scan using jobs_pkey on jobs t1
(cost=0.00..0.28 rows=1 width=8) (actual time=0.002..0.002 rows=0
loops=3346768)
                           Index Cond: (id = t0.jobid)
                           Filter: ((priority = 5) AND ((status =
'A'::bpchar) OR (status = 'a'::bpchar)))
                           Rows Removed by Filter: 1
+++++++++




Here's the result of the two queries (the docid values have been edited
slightly):

# EXPLAIN SELECT
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
FROM jobqueue t0 WHERE t0.checkaction='R' AND t0.checktime<=2414771559577
ORDER BY t0.docpriority LIMIT 20;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..44.91 rows=20 width=177)
   ->  Index Scan using i1392985450172 on jobqueue t0
(cost=0.00..2332467.51 rows=1038668 width=177)
         Index Cond: ((checkaction = 'R'::bpchar) AND (checktime <=
2414771559577::bigint))
(3 rows)


# SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
FROM jobqueue t0 WHERE t0.checkaction='R' AND t0.checktime<=2414771559577
ORDER BY t0.docpriority LIMIT 20;
      id       |     jobid     |                 dochash
|                         docid                                          |
status | failtime | failcount |  priorityset
---------------+---------------+------------------------------------------+--------------------------------------------------------+--------+----------+-----------+---------------
 1414075904032 | 1414075755977 | 874175EB926535C5FA51041B6749ED5D07E733FB |
smb://server12/share6/dataset/              | G      |          |
| 1414075898255
 1414075908486 | 1414075755977 | 0926EDF31E39E33B5F2ADAC7444D978BE8C44B74 |
smb://server12/share6/dataset/sj/     | G      |          |           |
1414075786902
 1414075908485 | 1414075755977 | 10E374B9BDF2079B50FFFCB0FE79657E443345BD |
smb://server12/share6/dataset/mk/         | G      |          |           |
1414075786902
 1414075908484 | 1414075755977 | 16612703F3352A933D7F1FBE92B732021DF89253 |
smb://server12/share6/dataset/bs/           | G      |          |
| 1414075786902
 1414075908483 | 1414075755977 | 37B69C012298D6E02D8F52DCC0584BF2E5F5ACC2 |
smb://server12/share6/dataset/le/           | G      |          |
| 1414075786902
 1414075908482 | 1414075755977 | 5A30C8F9B135C44A0AED75606950F97359C7DD6F |
smb://server12/share6/dataset/se/         | G      |          |           |
1414075786902
 1414075908481 | 1414075755977 | 5BC721E8788D524E4F47F54568FDBEB7A3E916E8 |
smb://server12/share6/dataset/mmk/      | G      |          |           |
1414075786902
 1414075908479 | 1414075755977 | 657D59AFB34889CD00C70C06D5079F4F11C2C6B8 |
smb://server12/share6/dataset/mm/       | G      |          |           |
1414075786902
 1414075908475 | 1414075755977 | 6BD906C002CC1CF76B455EDB5ADDFC56038F24A5 |
smb://server12/share6/dataset/sg/      | G      |          |           |
1414075786902
 1414075908470 | 1414075755977 | 70597810DD4A5E5F915491B0F1C2FA617257D728 |
smb://server12/share6/dataset/pd/     | G      |          |           |
1414075786902
 1414075908467 | 1414075755977 | 7212CA810B46E82A2CAD2385B8D679635EEDE9C9 |
smb://server12/share6/dataset/kp/         | G      |          |           |
1414075786902
 1414075908466 | 1414075755977 | 8298498EE916875B7C6FB8C86AC81137E0623444 |
smb://server12/share6/dataset/hk/           | G      |          |
| 1414075786902
 1414075908464 | 1414075755977 | 9D5D7536E1090B0B3A6745ADEE7E8498DFDCE15F |
smb://server12/share6/dataset/kk/           | G      |          |
| 1414075786902
 1414075908462 | 1414075755977 | A656510E82BA82E97091C47CAF2773908D636BD0 |
smb://server12/share6/dataset/wk/          | G      |          |
| 1414075786902
 1414075908461 | 1414075755977 | B2C98456C11BF5D3429FDD769C56E6CF6A8FD5F0 |
smb://server12/share6/dataset/gr/          | G      |          |
| 1414075786902
 1414075908458 | 1414075755977 | BB394B981A1928A9695FE31010D804241A72DF64 |
smb://server12/share6/dataset/dc/          | G      |          |
| 1414075786902
 1414075908457 | 1414075755977 | C914D0196854BF457F9631F5D3446ACCA2D5676E |
smb://server12/share6/dataset/dl/           | G      |          |
| 1414075786902
 1414075908456 | 1414075755977 | CE8C71D2B9768DE078734AD01B149CDFCE976B53 |
smb://server12/share6/dataset/cm/         | G      |          |           |
1414075786902
 1414075908455 | 1414075755977 | DA56598671A6A272D15D6D7216C0E7C86C706DC6 |
smb://server12/share6/dataset/lp/      | G      |          |           |
1414075786902
 1414075908453 | 1414075755977 | EF58A961D56F18BA0EFACE9933457A6CF4FB683B |
smb://server12/share6/dataset/fj/           | G      |          |
| 1414075786902
(20 rows)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message