manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: (Continuous) crawl performance
Date Tue, 04 Nov 2014 18:23:17 GMT
Hi Aeham,

I would be careful to set the "priorityset" field value to 0 only for
documents that have state "G" and whose job is active.

bq. I believe the priority should be set by
ManifoldCF#
resetAllDocumentPriorities but it's not, because
JobManager#getNextNotYetProcessedReprioritizationDocuments returns no rows
to update

Where is resetAllDocumentPriorities being called from that you are seeing
this?  When a job is started, documents that are put into the "G" state all
have prioritySet times set to 0, so that the subsequent
resetAllDocumentPriorities() call will assign priorities to them.  If there
are other conditions where resetAllDocumentPriorities() is getting called
after documents are put into the "G" state, where this ISN'T happening, I'd
like to know about them.

Thanks,
Karl


On Tue, Nov 4, 2014 at 1:15 PM, Aeham Abushwashi <
aeham.abushwashi@exonar.com> wrote:

> Hi Karl,
>
> After applying the 1.7.2 revisions for CONNECTORS-1090, -1091, -1092 and
> -1093 to my 1.6.1 branch, if I create a new crawl, then its documents get
> picked up by the next scan; however, that doesn't happen for an existing
> crawl. The docpriority for documents in the existing craw is still at
> 1000000001.
>
> I believe the priority should be set by
> ManifoldCF#resetAllDocumentPriorities but it's not, because
> JobManager#getNextNotYetProcessedReprioritizationDocuments returns no rows
> to update, which I think is due to the legacy job's docs having a
> priorityset of NULL. Replacing the current priorityset condition in
> JobManager#getNextNotYetProcessedReprioritizationDocuments with
> (priorityset IS NULL OR priorityset<?) addresses this specific issue.
> Is this a valid fix or do you see it introducing undesirable behaviour or
> masking an issue elsewhere?
>
> Cheers,
> Aeham
> ‚Äč
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message