manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1093) ManifoldCF document reprioritization bottleneck
Date Tue, 04 Nov 2014 09:08:33 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195939#comment-14195939
] 

Karl Wright commented on CONNECTORS-1093:
-----------------------------------------

r1636521 (release-1.7-branch)


> ManifoldCF document reprioritization bottleneck
> -----------------------------------------------
>
>                 Key: CONNECTORS-1093
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1093
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework agents process
>    Affects Versions: ManifoldCF 1.7.2, ManifoldCF 1.8, ManifoldCF 2.0
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>            Priority: Blocker
>             Fix For: ManifoldCF 1.7.2, ManifoldCF 1.8, ManifoldCF 2.0
>
>
> Starting a job with 200K+ documents now takes many minutes.  The reason seems to be document
reprioritization, which has a significant bottleneck.  A thread dump shows:
> {code}
> 	at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:694)
> 	at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728)
> 	at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:762)
> 	at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1435)
> 	at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
> 	at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191)
> 	at org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performModification(DBInterfaceHSQLDB.java:750)
> 	at org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performUpdate(DBInterfaceHSQLDB.java:296)
> 	at org.apache.manifoldcf.core.database.BaseTable.performUpdate(BaseTable.java:80)
> 	at org.apache.manifoldcf.crawler.bins.BinManager.getIncrementBinValues(BinManager.java:158)
> 	at org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.getIncrementBinValue(ReprioritizationTracker.java:328)
> 	at org.apache.manifoldcf.crawler.system.PriorityCalculator.getDocumentPriority(PriorityCalculator.java:145)
> 	at org.apache.manifoldcf.crawler.jobs.JobQueue.writeDocPriority(JobQueue.java:874)
> 	at org.apache.manifoldcf.crawler.jobs.JobManager.writeDocumentPriorities(JobManager.java:2142)
> 	at org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1121)
> 	at org.apache.manifoldcf.crawler.system.ManifoldCF.resetAllDocumentPriorities(ManifoldCF.java:1054)
> 	at org.apache.manifoldcf.crawler.system.StartupThread.run(StartupThread.java:141)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message