manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1554) Job stuck during crawl documents on folder
Date Wed, 07 Nov 2018 00:16:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677450#comment-16677450
] 

Karl Wright commented on CONNECTORS-1554:
-----------------------------------------

Note that if you perform the lock-clean procedure *as described*, all the documents should
be reprioritized in any case, so all crawling should resume.  After that, if you wind up with
stuck documents it should be possible to look at the simple history for one of the stuck ones
to see what happened to it.

The document retry logic has not changed for years, and was last changed in a minor way to
address this very problem back in 2015.  Documents that get retried wind up being given to
a thread that recomputes their priority.  The need to do this is signaled by the "needspriority"
field being set to "Y", and then the reprioritization threads kick in and set the priority
eventually.

So if you have jobqueue entries with the docpriority value of 1E9+1, a status of "P" or "G",
and a needspriority field NOT set to 'Y', then those documents are stuck and I don't know
how they got there.  So I need to know what happened to them that caused this.  



> Job stuck during crawl documents on folder
> ------------------------------------------
>
>                 Key: CONNECTORS-1554
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1554
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Active Directory authority, File system connector, Tika extractor
>    Affects Versions: ManifoldCF 2.11
>         Environment: Ubuntu Server 18.04
> ManifoldCF 2.11
> Solr 7.5.0
> Tika Server 1.19.1
>            Reporter: Mario Bisonti
>            Assignee: Karl Wright
>            Priority: Major
>             Fix For: ManifoldCF 2.11
>
>         Attachments: SimpleHistory.png, manifoldcf.log
>
>
> Hallo.
> When I start a job that index a Windows Share, it stucks after a 15 minutes near.
>  
> I see error in ManifoldCF.log as you can see in the attachment
>  
> I attached "Simple History" with the last documents crawled.
> Thanks a lot.
> Mario
> [^manifoldcf.log]!SimpleHistory.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message