manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shigeki Kobayashi <shigeki.kobayas...@g.softbank.co.jp>
Subject Too many slow queries caused by MCF running MySQL 5.5
Date Mon, 10 Dec 2012 07:13:58 GMT
Hi.


I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL
I tried to crawl 10 million files using Windows share connection and index
them into Solr.

As MCF reached over 1 million files, the crawling speed started getting
slower.
So I checked slow queries and found out that too many slow queries
occurred, especially the following kinds:

--------------------------------------------------------
# Time: 121204 16:25:40
# User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1]
# Query_time: 7.240532  Lock_time: 0.000204 Rows_sent: 1200  Rows_examined:
611091
SET timestamp=1354605940;
SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
FROM jobqueue t0 WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND
t0.checktime<=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE
t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT
EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND
t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT
EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND
t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status
ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200;
# Time: 121204 16:25:44
# User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1]
# Query_time: 3.064339  Lock_time: 0.000084 Rows_sent: 1  Rows_examined:
406359
SET timestamp=1354605944;
SELECT docpriority,jobid,dochash,docid FROM jobqueue t0 WHERE status IN
('P','G') AND checkaction='R' AND checktime<=1354605932817 AND
EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid)
 ORDER BY docpriority ASC,status ASC,checkaction ASC,checktime ASC LIMIT 1;
-------------------------------------------------------

I wonder if the queries appropriately use index of the table.
As a result of EXPLAIN against the slow query, there was filesort.
There seems to be some conditions that MySQL does not use index depending
on ORDER BY:
 - Executing ORDER BY against multiple keys
 - When keys selected from records are different from keys used by ORDER BY

Since filesort was happening, fully scanning records should be having MCF
slower.

Do you think this could happen even in PostgreSQL or HSQLDB?
Do you think queries could be modified to use index appropriately?


Regards,

Shigeki

Mime
View raw message