Return-Path: X-Original-To: apmail-manifoldcf-dev-archive@www.apache.org Delivered-To: apmail-manifoldcf-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B2F34174CD for ; Mon, 3 Nov 2014 10:19:02 +0000 (UTC) Received: (qmail 15313 invoked by uid 500); 3 Nov 2014 10:19:02 -0000 Delivered-To: apmail-manifoldcf-dev-archive@manifoldcf.apache.org Received: (qmail 15261 invoked by uid 500); 3 Nov 2014 10:19:02 -0000 Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list dev@manifoldcf.apache.org Received: (qmail 15250 invoked by uid 99); 3 Nov 2014 10:19:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Nov 2014 10:19:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.220.178] (HELO mail-vc0-f178.google.com) (209.85.220.178) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Nov 2014 10:18:56 +0000 Received: by mail-vc0-f178.google.com with SMTP id la4so3001238vcb.23 for ; Mon, 03 Nov 2014 02:18:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=cAM7MkMwgo9mbYwwrZsvUNfx2VifylyBvB8AXEn5TDQ=; b=QAr+HK9LuAbdvwYnXeTNcZq+XRCd06fU97o8E7DsrhoPizTsH90RGazUH3b9cLo6eO NCcgvQazleyEtiGsnDtQD6loTPkaP8iKdFTvSckg1+ngd7zSjB9AlvdKTj7u5Uxa+bAZ lSkchnGaemMSjSzgXJKoNW94YDLO7EUxq31f6digFYD15q2yp9Zelvu038YOe9C6sh+a EurSYB3CjHCoz0eN1f+yj67+R+iVtsusq74zJtgiZQAvEicAFmlwOyCTeyWnxLgAaUTZ QRip4hoIuVKe+VVE6/sGGEGcLday24A+ogF1ucQqt1c3ILPDor1b2APoIxqCSllH50kN 6ELA== X-Gm-Message-State: ALoCoQm8fNwXXVUvwulVtm9CMcTsalShzjnPaFVyFcvB7HlDzCMp3zi5RYF9lui7CrkzR3NVgW05 MIME-Version: 1.0 X-Received: by 10.220.111.6 with SMTP id q6mr35436819vcp.12.1415009914572; Mon, 03 Nov 2014 02:18:34 -0800 (PST) Received: by 10.221.7.193 with HTTP; Mon, 3 Nov 2014 02:18:34 -0800 (PST) X-Originating-IP: [217.37.166.138] In-Reply-To: References: Date: Mon, 3 Nov 2014 10:18:34 +0000 Message-ID: Subject: Re: (Continuous) crawl performance From: Aeham Abushwashi To: dev@manifoldcf.apache.org Content-Type: multipart/alternative; boundary=047d7b3a87924fe93c0506f1ab42 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a87924fe93c0506f1ab42 Content-Type: text/plain; charset=UTF-8 Hi Karl, >> It's reading from the correct index, but is nevertheless taking a long time, possibly because there are a lot of records with status 'P' or 'G' that belong to jobs that are not in state 'a' or 'A' that it has to go through before it assembles 240 results. Can you confirm this picture? Yes, as you can see from the previous query plans, some time is spent scanning index i1392985450172 (docpriority, status, checkaction, checktime), which identifies >3M rows only for them to be (slowly) filtered out in the subsequent Nested Loop Semi Join step +++++++++ -> Nested Loop Semi Join (cost=0.00..2634519.48 rows=7784 width=177) (actual time=12395.696..12395.696 rows=0 loops=1) -> Index Scan using i1392985450172 on jobqueue t0 (cost=0.00..2609016.45 rows=89520 width=177) (actual time=668.127..3970.495 rows=3346768 loops=1) Index Cond: ((checkaction = 'R'::bpchar) AND (checktime <= 1414683318802::bigint)) Filter: ((status = 'P'::bpchar) OR (status = 'G'::bpchar)) -> Index Scan using jobs_pkey on jobs t1 (cost=0.00..0.28 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=3346768) Index Cond: (id = t0.jobid) Filter: ((priority = 5) AND ((status = 'A'::bpchar) OR (status = 'a'::bpchar))) Rows Removed by Filter: 1 +++++++++ Here's the result of the two queries (the docid values have been edited slightly): # EXPLAIN SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.checkaction='R' AND t0.checktime<=2414771559577 ORDER BY t0.docpriority LIMIT 20; QUERY PLAN ------------------------------------------------------------------------------------------------------ Limit (cost=0.00..44.91 rows=20 width=177) -> Index Scan using i1392985450172 on jobqueue t0 (cost=0.00..2332467.51 rows=1038668 width=177) Index Cond: ((checkaction = 'R'::bpchar) AND (checktime <= 2414771559577::bigint)) (3 rows) # SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.checkaction='R' AND t0.checktime<=2414771559577 ORDER BY t0.docpriority LIMIT 20; id | jobid | dochash | docid | status | failtime | failcount | priorityset ---------------+---------------+------------------------------------------+--------------------------------------------------------+--------+----------+-----------+--------------- 1414075904032 | 1414075755977 | 874175EB926535C5FA51041B6749ED5D07E733FB | smb://server12/share6/dataset/ | G | | | 1414075898255 1414075908486 | 1414075755977 | 0926EDF31E39E33B5F2ADAC7444D978BE8C44B74 | smb://server12/share6/dataset/sj/ | G | | | 1414075786902 1414075908485 | 1414075755977 | 10E374B9BDF2079B50FFFCB0FE79657E443345BD | smb://server12/share6/dataset/mk/ | G | | | 1414075786902 1414075908484 | 1414075755977 | 16612703F3352A933D7F1FBE92B732021DF89253 | smb://server12/share6/dataset/bs/ | G | | | 1414075786902 1414075908483 | 1414075755977 | 37B69C012298D6E02D8F52DCC0584BF2E5F5ACC2 | smb://server12/share6/dataset/le/ | G | | | 1414075786902 1414075908482 | 1414075755977 | 5A30C8F9B135C44A0AED75606950F97359C7DD6F | smb://server12/share6/dataset/se/ | G | | | 1414075786902 1414075908481 | 1414075755977 | 5BC721E8788D524E4F47F54568FDBEB7A3E916E8 | smb://server12/share6/dataset/mmk/ | G | | | 1414075786902 1414075908479 | 1414075755977 | 657D59AFB34889CD00C70C06D5079F4F11C2C6B8 | smb://server12/share6/dataset/mm/ | G | | | 1414075786902 1414075908475 | 1414075755977 | 6BD906C002CC1CF76B455EDB5ADDFC56038F24A5 | smb://server12/share6/dataset/sg/ | G | | | 1414075786902 1414075908470 | 1414075755977 | 70597810DD4A5E5F915491B0F1C2FA617257D728 | smb://server12/share6/dataset/pd/ | G | | | 1414075786902 1414075908467 | 1414075755977 | 7212CA810B46E82A2CAD2385B8D679635EEDE9C9 | smb://server12/share6/dataset/kp/ | G | | | 1414075786902 1414075908466 | 1414075755977 | 8298498EE916875B7C6FB8C86AC81137E0623444 | smb://server12/share6/dataset/hk/ | G | | | 1414075786902 1414075908464 | 1414075755977 | 9D5D7536E1090B0B3A6745ADEE7E8498DFDCE15F | smb://server12/share6/dataset/kk/ | G | | | 1414075786902 1414075908462 | 1414075755977 | A656510E82BA82E97091C47CAF2773908D636BD0 | smb://server12/share6/dataset/wk/ | G | | | 1414075786902 1414075908461 | 1414075755977 | B2C98456C11BF5D3429FDD769C56E6CF6A8FD5F0 | smb://server12/share6/dataset/gr/ | G | | | 1414075786902 1414075908458 | 1414075755977 | BB394B981A1928A9695FE31010D804241A72DF64 | smb://server12/share6/dataset/dc/ | G | | | 1414075786902 1414075908457 | 1414075755977 | C914D0196854BF457F9631F5D3446ACCA2D5676E | smb://server12/share6/dataset/dl/ | G | | | 1414075786902 1414075908456 | 1414075755977 | CE8C71D2B9768DE078734AD01B149CDFCE976B53 | smb://server12/share6/dataset/cm/ | G | | | 1414075786902 1414075908455 | 1414075755977 | DA56598671A6A272D15D6D7216C0E7C86C706DC6 | smb://server12/share6/dataset/lp/ | G | | | 1414075786902 1414075908453 | 1414075755977 | EF58A961D56F18BA0EFACE9933457A6CF4FB683B | smb://server12/share6/dataset/fj/ | G | | | 1414075786902 (20 rows) --047d7b3a87924fe93c0506f1ab42--