manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Scheduler not working as we expected
Date Tue, 25 Sep 2018 08:46:02 GMT
It's obviously a configuration problem.  Are you using the extract update
handler?  If not, do you have tika in the pipeline?

Karl


On Tue, Sep 25, 2018 at 4:24 AM Ronny Heylen <ronnyheylen@gmail.com> wrote:

> Hi,
> We have been using SOLR for a few years and now the server has been
> transferred to the VM's in out HQ ( and reinstalled ),
> We ara having the the following issue now :
> orcing SOLR indexation by curl works, as we can see from:
> *curl "*
> *http://gbsloappwp0083.corp.qbe.com:8080/solr/update/extract?literal.id=1&commit=true*
> <http://gbsloappwp0083.corp.qbe.com:8080/solr/update/extract?literal.id=1&commit=true>*"
> -F "myfile=@z:\qbere_bru\common\testsolr.txt"*
> which has successfully indexed testsolr.txt.
> As can be checked by:
> http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=ella
> giving:
> <result name="response" numFound="1" start="0">
> Searching for john returns 0 files:
> http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=john
> <result name="response" numFound="0" start="0"/>
> and searching for any gives also 1 file:
> http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=*
> <result name="response" numFound="1" start="0">
>
> However, launching a job from ManifoldCF doesn't seem to work.
> We see the folder names in file definition, we see that the job indexes
> documents (or at least seems to do so), but SOLR API:
> http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=*
> still return 1 file only, the one we have manually indexed
>
> If anybody have anu suggestion, would be really gratful
>
> Ronny.Heylen@qbere.com
>
>
> aan ik
>
> Op di 31 jul. 2018 om 12:12 schreef Karl Wright <daddywri@gmail.com>:
>
>> Hi Vinay,
>>
>> Dynamic rescan is meant for web-crawling and revisits already crawled
>> documents based on how often they have changed in the past.  It is
>> therefore wholly inappropriate for something like a file crawl, since
>> directory contents (one of the kinds of documents there are in a file
>> crawl) change very infrequently.
>>
>> Instead, I recommend that you run complete crawls, non-dynamic.  You can
>> even run minimal crawls fairly often, which will pick up new and changed
>> documents, and run non-minimal crawls on a less frequent schedule to
>> capture deletions.
>>
>> Thanks,
>> Karl
>>
>>
>> On Tue, Jul 31, 2018 at 4:05 AM VINAY Bengaluru <vinaybs.20@gmail.com>
>> wrote:
>>
>>> Hi Karl,
>>>                We have set up a scheduler for our jobs with input
>>> connector as file system and output connector as Solr.
>>> We have set up a scheduler as follows :
>>> Schedule type: Rescan documents dynamically
>>> Recrawl interval: blank
>>> Schedule time: appropriate times with job invocation as complete.
>>>
>>> We see that the job is not picking up documents at the scheduled
>>> intervals.
>>>
>>> Why the job doesn't pickup new docs at the scheduled interval? Anything
>>> wrong with our job configuration or our understanding?
>>>
>>> Thanks and regards,
>>> Vinay
>>>
>>>

Mime
View raw message