manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CONNECTORS-1094) Slow reprioritization impedes startup
Date Wed, 05 Nov 2014 19:31:34 GMT

     [ https://issues.apache.org/jira/browse/CONNECTORS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karl Wright resolved CONNECTORS-1094.
-------------------------------------
    Resolution: Fixed

> Slow reprioritization impedes startup
> -------------------------------------
>
>                 Key: CONNECTORS-1094
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1094
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework crawler agent
>    Affects Versions: Manifold 1.7.1
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7.2, ManifoldCF 1.8, ManifoldCF 2.0
>
>         Attachments: CONNECTORS-1094.patch
>
>
> With the latest revisions, documents for all jobs (legacy and new) do get
> picked up and processed, which is great! This was verified on a small
> 1-node test system.
> I have since applied the fix to a much larger environment (29M docs across
> 4 MCF agents using a 3-node Zookeeper cluster) which has a bunch of
> mid-sized (100,000s docs) jobs in a Running state. The update of the
> priorityset field for ~36M jobqueue records took just over an hour. More
> problematically for me is the rate of reprioritization on startup which was
> very slow - nearly 2 hours to update ~600,000 records.
> A couple of SQL queries
> (JobManager#getNextNotYetProcessedRepriotizationDocuments and
> ManifoldCF#writeDocumentPriorities) come up frequently, but a VisualVM
> profile of the MCF agent shows the majority of the Agents thread's time is
> spent talking to ZK, for locking + reading some config data very frequently
> - see the snapshots below.
> Is it possible to avoid the per-document locking pattern seen in this case?
> {code}
> "Agents thread" - Thread t@21
>    java.lang.Thread.State: WAITING
>     at java.lang.Object.wait(Native Method)
>     - waiting on <487ef1bb> (a org.apache.zookeeper.ClientCnxn$Packet)
>     at java.lang.Object.wait(Object.java:503)
>     at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1149)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1180)
>     at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.readData(ZooKeeperConnection.java:819)
>     at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.getSharedConfiguration(ZooKeeperLockManager.java:670)
>     at
> org.apache.manifoldcf.core.interfaces.LockManagerFactory.getBooleanProperty(LockManagerFactory.java:110)
>     at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.setThreadContext(SharedDriveConnector.java:157)
>     at
> org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool.getConnector(ConnectorPool.java:489)
>     - locked <3f2843d4> (a
> org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool)
>     at
> org.apache.manifoldcf.core.connectorpool.ConnectorPool.grab(ConnectorPool.java:255)
>     at
> org.apache.manifoldcf.crawler.repositoryconnectorpool.RepositoryConnectorPool.grab(RepositoryConnectorPool.java:86)
>     at
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1007)
>     at
> org.apache.manifoldcf.crawler.system.ManifoldCF.resetAllDocumentPriorities(ManifoldCF.java:960)
>     at
> org.apache.manifoldcf.crawler.system.CrawlerAgent.cleanUpAllAgentData(CrawlerAgent.java:155)
>     at
> org.apache.manifoldcf.agents.system.AgentsDaemon$CleanupAgent.cleanUpAllServices(AgentsDaemon.java:356)
>     at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.registerServiceBeginServiceActivity(ZooKeeperLockManager.java:203)
> {code}
> {code}
> "Agents thread" - Thread t@21
>    java.lang.Thread.State: WAITING
>     at java.lang.Object.wait(Native Method)
>     - waiting on <52698c72> (a org.apache.zookeeper.ClientCnxn$Packet)
>     at java.lang.Object.wait(Object.java:503)
>     at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>     at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:781)
>     at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.createSequentialChild(ZooKeeperConnection.java:1116)
>     at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.obtainReadLock(ZooKeeperConnection.java:691)
>     at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject.obtainGlobalReadLock(ZooKeeperLockObject.java:193)
>     at
> org.apache.manifoldcf.core.lockmanager.LockObject.enterReadLock(LockObject.java:310)
>     - locked <151db932> (a
> org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject)
>     at
> org.apache.manifoldcf.core.lockmanager.LockGate.enterReadLock(LockGate.java:261)
>     at
> org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterRead(BaseLockManager.java:1283)
>     at
> org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterReadLock(BaseLockManager.java:790)
>     at
> org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.getMinimumDepth(ReprioritizationTracker.java:251)
>     at
> org.apache.manifoldcf.crawler.system.PriorityCalculator.<init>(PriorityCalculator.java:89)
>     at
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1021)
>     at
> org.apache.manifoldcf.crawler.system.ManifoldCF.resetAllDocumentPriorities(ManifoldCF.java:960)
>     at
> org.apache.manifoldcf.crawler.system.CrawlerAgent.cleanUpAllAgentData(CrawlerAgent.java:155)
> {code}
> {code}
> "Agents thread" - Thread t@21
>    java.lang.Thread.State: WAITING
>     at java.lang.Object.wait(Native Method)
>     - waiting on <79c64d6d> (a org.apache.zookeeper.ClientCnxn$Packet)
>     at java.lang.Object.wait(Object.java:503)
>     at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>     at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:871)
>     at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.releaseLock(ZooKeeperConnection.java:796)
>     at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject.clearLock(ZooKeeperLockObject.java:218)
>     at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject.clearGlobalReadLockNoWait(ZooKeeperLockObject.java:212)
>     at
> org.apache.manifoldcf.core.lockmanager.LockObject.clearGlobalReadLock(LockObject.java:395)
>     at
> org.apache.manifoldcf.core.lockmanager.LockObject.leaveReadLock(LockObject.java:376)
>     - locked <126e1776> (a
> org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject)
>     at
> org.apache.manifoldcf.core.lockmanager.LockGate.leaveReadLock(LockGate.java:289)
>     - locked <126e1776> (a
> org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject)
>     at
> org.apache.manifoldcf.core.lockmanager.BaseLockManager.leaveRead(BaseLockManager.java:1369)
>     at
> org.apache.manifoldcf.core.lockmanager.BaseLockManager.leaveReadLock(BaseLockManager.java:804)
>     at
> org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.getMinimumDepth(ReprioritizationTracker.java:258)
>     at
> org.apache.manifoldcf.crawler.system.PriorityCalculator.<init>(PriorityCalculator.java:89)
>     at
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1021)
>     at
> org.apache.manifoldcf.crawler.system.ManifoldCF.resetAllDocumentPriorities(ManifoldCF.java:960)
>     at
> org.apache.manifoldcf.crawler.system.CrawlerAgent.cleanUpAllAgentData(CrawlerAgent.java:155)
>     at
> org.apache.manifoldcf.agents.system.AgentsDaemon$CleanupAgent.cleanUpAllServices(AgentsDaemon.java:356)
> {code}
> {code}
> "Agents thread" - Thread t@21
>    java.lang.Thread.State: WAITING
>     at java.lang.Object.wait(Native Method)
>     - waiting on <354dbdf> (a org.apache.zookeeper.ClientCnxn$Packet)
>     at java.lang.Object.wait(Object.java:503)
>     at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1149)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1180)
>     at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.readData(ZooKeeperConnection.java:819)
>     at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.getSharedConfiguration(ZooKeeperLockManager.java:670)
>     at
> org.apache.manifoldcf.core.interfaces.LockManagerFactory.getBooleanProperty(LockManagerFactory.java:110)
>     at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.setThreadContext(SharedDriveConnector.java:157)
>     at
> org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool.getConnector(ConnectorPool.java:489)
>     - locked <6f2f3168> (a
> org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool)
>     at
> org.apache.manifoldcf.core.connectorpool.ConnectorPool.grab(ConnectorPool.java:255)
>     at
> org.apache.manifoldcf.crawler.repositoryconnectorpool.RepositoryConnectorPool.grab(RepositoryConnectorPool.java:86)
>     at
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1007)
>     at
> org.apache.manifoldcf.crawler.system.ManifoldCF.resetAllDocumentPriorities(ManifoldCF.java:960)
>     at
> org.apache.manifoldcf.crawler.system.CrawlerAgent.cleanUpAllAgentData(CrawlerAgent.java:155)
>     at
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message