Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C433B9C6A for ; Mon, 10 Dec 2012 08:05:50 +0000 (UTC) Received: (qmail 54193 invoked by uid 500); 10 Dec 2012 08:05:50 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 54043 invoked by uid 500); 10 Dec 2012 08:05:49 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 54016 invoked by uid 99); 10 Dec 2012 08:05:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 08:05:48 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of daddywri@gmail.com designates 209.85.210.180 as permitted sender) Received: from [209.85.210.180] (HELO mail-ia0-f180.google.com) (209.85.210.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 08:05:44 +0000 Received: by mail-ia0-f180.google.com with SMTP id t4so4879672iag.39 for ; Mon, 10 Dec 2012 00:05:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=WBZjC8NdYvc8F1eLynrkGLQ/tJXCJijINYqpOeiYHwA=; b=zCOT5eg6OEX2thAEEmqEtI2CgSU159D6h7c0Zs5ftdvB+OC53J5p9ssAhhMW/o5QW2 HGInAhLahTvg633tzgPQ34HLNgv5/TtT/xsEQOKgUu+hotImj5UwmH4VoKdeePhPD5fN fOw5n9n5x+7WbBzR7XWmgSZhrV9kXyLPslti9IikqhhI1TMHc76ylW3yRKxO3tpmIryy ZJ2pLyn+P3J41AryIIecXPRKgG6T+Wz9vv4exYaKZPBGVEGjRhi+k1tdpbyaRiC20sYc VMeKkUCcTFV5LFbh68E2JO/rB0+l7yzauftZaL6q0lgOq5yxfMahNBxjjca536rd3gxN XqCg== MIME-Version: 1.0 Received: by 10.50.182.166 with SMTP id ef6mr5722278igc.65.1355126723559; Mon, 10 Dec 2012 00:05:23 -0800 (PST) Received: by 10.42.73.7 with HTTP; Mon, 10 Dec 2012 00:05:23 -0800 (PST) In-Reply-To: References: Date: Mon, 10 Dec 2012 03:05:23 -0500 Message-ID: Subject: Re: Too many slow queries caused by MCF running MySQL 5.5 From: Karl Wright To: user@manifoldcf.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Since you have a large table, can you try an EXPLAIN for the following query, which should match the explanation given here: http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html ? Does it use the index? SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.docpriority >= 0 AND t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime<=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200 Thanks! Karl On Mon, Dec 10, 2012 at 2:49 AM, Karl Wright wrote: > Hi Shigeki, > > The rules for when a database will use an index for an ORDER BY clause > differ significantly from database to database. The current logic > seems to satisfy PostgreSQL, HSQLDB, and Derby, but clearly not MySQL. > I will see if I can find a solution. The ticket for this > CONNECTORS-584. > > Karl > > On Mon, Dec 10, 2012 at 2:13 AM, Shigeki Kobayashi > wrote: >> >> Hi. >> >> >> I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL >> I tried to crawl 10 million files using Windows share connection and index >> them into Solr. >> >> As MCF reached over 1 million files, the crawling speed started getting >> slower. >> So I checked slow queries and found out that too many slow queries occurred, >> especially the following kinds: >> >> -------------------------------------------------------- >> # Time: 121204 16:25:40 >> # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1] >> # Query_time: 7.240532 Lock_time: 0.000204 Rows_sent: 1200 Rows_examined: >> 611091 >> SET timestamp=1354605940; >> SELECT >> t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset >> FROM jobqueue t0 WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND >> t0.checktime<=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE >> t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT >> EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status >> IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT >> 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND >> t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status >> ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; >> # Time: 121204 16:25:44 >> # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1] >> # Query_time: 3.064339 Lock_time: 0.000084 Rows_sent: 1 Rows_examined: >> 406359 >> SET timestamp=1354605944; >> SELECT docpriority,jobid,dochash,docid FROM jobqueue t0 WHERE status IN >> ('P','G') AND checkaction='R' AND checktime<=1354605932817 AND EXISTS(SELECT >> 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid) ORDER BY >> docpriority ASC,status ASC,checkaction ASC,checktime ASC LIMIT 1; >> ------------------------------------------------------- >> >> I wonder if the queries appropriately use index of the table. >> As a result of EXPLAIN against the slow query, there was filesort. >> There seems to be some conditions that MySQL does not use index depending on >> ORDER BY: >> - Executing ORDER BY against multiple keys >> - When keys selected from records are different from keys used by ORDER BY >> >> Since filesort was happening, fully scanning records should be having MCF >> slower. >> >> Do you think this could happen even in PostgreSQL or HSQLDB? >> Do you think queries could be modified to use index appropriately? >> >> >> Regards, >> >> Shigeki