manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aeham Abushwashi <aeham.abushwa...@exonar.com>
Subject Re: (Continuous) crawl performance
Date Tue, 04 Nov 2014 18:15:20 GMT
Hi Karl,

After applying the 1.7.2 revisions for CONNECTORS-1090, -1091, -1092 and
-1093 to my 1.6.1 branch, if I create a new crawl, then its documents get
picked up by the next scan; however, that doesn't happen for an existing
crawl. The docpriority for documents in the existing craw is still at
1000000001.

I believe the priority should be set by
ManifoldCF#resetAllDocumentPriorities but it's not, because
JobManager#getNextNotYetProcessedReprioritizationDocuments returns no rows
to update, which I think is due to the legacy job's docs having a
priorityset of NULL. Replacing the current priorityset condition in
JobManager#getNextNotYetProcessedReprioritizationDocuments with
(priorityset IS NULL OR priorityset<?) addresses this specific issue.
Is this a valid fix or do you see it introducing undesirable behaviour or
masking an issue elsewhere?

Cheers,
Aeham
‚Äč

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message