manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CONNECTORS-1093) ManifoldCF document reprioritization bottleneck
Date Tue, 04 Nov 2014 08:27:33 GMT
Karl Wright created CONNECTORS-1093:
---------------------------------------

             Summary: ManifoldCF document reprioritization bottleneck
                 Key: CONNECTORS-1093
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1093
             Project: ManifoldCF
          Issue Type: Bug
          Components: Framework agents process
    Affects Versions: ManifoldCF 1.7.2, ManifoldCF 1.8, ManifoldCF 2.0
            Reporter: Karl Wright
            Assignee: Karl Wright
             Fix For: ManifoldCF 1.7.2, ManifoldCF 1.8, ManifoldCF 2.0


Starting a job with 200K+ documents now takes many minutes.  The reason seems to be document
reprioritization, which has a significant bottleneck.  A thread dump shows:

{code}
	at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:694)
	at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728)
	at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:762)
	at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1435)
	at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
	at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191)
	at org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performModification(DBInterfaceHSQLDB.java:750)
	at org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performUpdate(DBInterfaceHSQLDB.java:296)
	at org.apache.manifoldcf.core.database.BaseTable.performUpdate(BaseTable.java:80)
	at org.apache.manifoldcf.crawler.bins.BinManager.getIncrementBinValues(BinManager.java:158)
	at org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.getIncrementBinValue(ReprioritizationTracker.java:328)
	at org.apache.manifoldcf.crawler.system.PriorityCalculator.getDocumentPriority(PriorityCalculator.java:145)
	at org.apache.manifoldcf.crawler.jobs.JobQueue.writeDocPriority(JobQueue.java:874)
	at org.apache.manifoldcf.crawler.jobs.JobManager.writeDocumentPriorities(JobManager.java:2142)
	at org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1121)
	at org.apache.manifoldcf.crawler.system.ManifoldCF.resetAllDocumentPriorities(ManifoldCF.java:1054)
	at org.apache.manifoldcf.crawler.system.StartupThread.run(StartupThread.java:141)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message