manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Too many slow queries caused by MCF running MySQL 5.5
Date Mon, 10 Dec 2012 08:05:23 GMT
Since you have a large table, can you try an EXPLAIN for the following
query, which should match the explanation given here:
http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html ?
Does it use the index?

SELECT
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
FROM jobqueue t0 WHERE t0.docpriority >= 0 AND t0.status IN ('P','G')
AND t0.checkaction='R' AND
t0.checktime<=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE
t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT
EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status
IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT
'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND
t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status
ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200

Thanks!
Karl

On Mon, Dec 10, 2012 at 2:49 AM, Karl Wright <daddywri@gmail.com> wrote:
> Hi Shigeki,
>
> The rules for when a database will use an index for an ORDER BY clause
> differ significantly from database to database.  The current logic
> seems to satisfy PostgreSQL, HSQLDB, and Derby, but clearly not MySQL.
>  I will see if I can find a solution.  The ticket for this
> CONNECTORS-584.
>
> Karl
>
> On Mon, Dec 10, 2012 at 2:13 AM, Shigeki Kobayashi
> <shigeki.kobayashi3@g.softbank.co.jp> wrote:
>>
>> Hi.
>>
>>
>> I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL
>> I tried to crawl 10 million files using Windows share connection and index
>> them into Solr.
>>
>> As MCF reached over 1 million files, the crawling speed started getting
>> slower.
>> So I checked slow queries and found out that too many slow queries occurred,
>> especially the following kinds:
>>
>> --------------------------------------------------------
>> # Time: 121204 16:25:40
>> # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1]
>> # Query_time: 7.240532  Lock_time: 0.000204 Rows_sent: 1200  Rows_examined:
>> 611091
>> SET timestamp=1354605940;
>> SELECT
>> t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
>> FROM jobqueue t0 WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND
>> t0.checktime<=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE
>> t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT
>> EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status
>> IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT
>> 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND
>> t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status
>> ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200;
>> # Time: 121204 16:25:44
>> # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1]
>> # Query_time: 3.064339  Lock_time: 0.000084 Rows_sent: 1  Rows_examined:
>> 406359
>> SET timestamp=1354605944;
>> SELECT docpriority,jobid,dochash,docid FROM jobqueue t0 WHERE status IN
>> ('P','G') AND checkaction='R' AND checktime<=1354605932817 AND EXISTS(SELECT
>> 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid)  ORDER BY
>> docpriority ASC,status ASC,checkaction ASC,checktime ASC LIMIT 1;
>> -------------------------------------------------------
>>
>> I wonder if the queries appropriately use index of the table.
>> As a result of EXPLAIN against the slow query, there was filesort.
>> There seems to be some conditions that MySQL does not use index depending on
>> ORDER BY:
>>  - Executing ORDER BY against multiple keys
>>  - When keys selected from records are different from keys used by ORDER BY
>>
>> Since filesort was happening, fully scanning records should be having MCF
>> slower.
>>
>> Do you think this could happen even in PostgreSQL or HSQLDB?
>> Do you think queries could be modified to use index appropriately?
>>
>>
>> Regards,
>>
>> Shigeki

Mime
View raw message