manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aeham Abushwashi (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-1118) Documents processed by the shared drive connector incur an unnecessary synchronisation hit
Date Tue, 09 Dec 2014 16:29:12 GMT


Aeham Abushwashi commented on CONNECTORS-1118:

Hi Karl,

Wrt the priority calculator, I may be missing something but the PriorityCalculator constructor
invoked in the snippet above would in turn call getMinimumDepth which acquires a lock and
this is done for every single document, correct?
I recall patch CONNECTORS-1094 invokes getMinimumDepth once for each group of documents therefore
avoiding the per-document lock.


> Documents processed by the shared drive connector incur an unnecessary synchronisation
> ------------------------------------------------------------------------------------------
>                 Key: CONNECTORS-1118
>                 URL:
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework core
>    Affects Versions: ManifoldCF 1.7.2
>            Reporter: Aeham Abushwashi
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.8, ManifoldCF 2.0
> Each document processed by the shared drive connector is passed through SharedDriveConnector#checkInclude
to verify whether the document is eligible for ingestion. The calls made here to WorkerThread$ProcessActivity#checkMimeTypeIndexable
and WorkerThread$ProcessActivity#checkLengthIndexable are unnecessarily costly as they each
create a fresh instance of IncrementalIngester$PipelineConnections on every call. The constructor
of IncrementalIngester$PipelineConnections can be very expensive due to the loading of output
connection objects, which in turn requires some locking (via ZK - in a distrubuted environment).
> The other area of inefficiency is in WorkerThread$ProcessActivity#processDocumentReferences.
This method creates new instances of PriorityCalculator using the less-efficient 3-arg constructor.
This can be addressed using the same pattern implemented for CONNECTORS-1094
> To highlight the impact of the above calls, I profiled an active worker thread for 40
minutes. During that window, it spent ~23 minutes in SharedDriveConnector#checkInclude and
its callees + 9 minutes creating instances of PriorityCalculator.
> I've seen the above issues when using the shared drive connector but I think other connectors
too could be impacted - depending on how they're implemented.

This message was sent by Atlassian JIRA

View raw message