manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: A less picky manifoldCF about solr errors?
Date Thu, 17 Oct 2013 12:39:40 GMT
Hi Roland,

Usually 500 errors are from Tika (aka Solr Cell).  If that's what you are
seeing, there is a way to disable them.  I don't remember precisely what
you do, but it has been posted to this list (and others) so a google search
should find that for you.

Thanks!
Karl



On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert <reveatwork@gmail.com>wrote:

> So far we had only to deal with HTTP code 500, because solr was not able
> to process some file types. We manage to tel solr to ignore tika exception.
> This helps us quite a lot, but solr as problem with processing some file
> types, and I have not yet find a way to tell solr to basically skip errors,
> while still logging them.
>
> I will check with the customer to get the error, but it was yesterday when
> it shows up and they have continued with the indexing (we are still at the
> initial indexing of the repository) and the logs with errors have
> disappeared.
>
>
> Thanks for your support,
>
>
> Roland.
>
>
>
> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Roland,
>>
>> It depends on what the error code is.  There is quite a bit of logic in
>> the Solr connector (and in ManifoldCF itself) for handling errors of
>> different kinds.  Fundamentally there are two main kinds of error condition
>> - one which causes a retry (and can, if so specified, cause either the
>> offending document to be skipped or the job aborted) and another which
>> always causes a job to abort.  The Solr connector has to decide based on
>> limited information exactly what to do.  General HTTP error codes such as
>> "500" errors, for example, contain little information and look just the
>> same whether the error represent a document Tika is unhappy with, or
>> something more fundamental, like a complete misconfiguration of Solr.
>>
>> If you can provide more detailed information as to the kind of error(s)
>> you are seeing then we can advise you further.
>>
>> Karl
>>
>>
>>
>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <reveatwork@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I helped a customer to deploy solr+manifoldcf to index files from a
>>> windows share drive. But every time solr is sending back an error message,
>>> the manifoldcf jobs abort, which is not really convenient for hour long
>>> indexing.
>>>
>>> So is there a possibility to configure manifold so it doesn't stopped
>>> every time solr return an http code different from 200?
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Roland.
>>>
>>
>>
>

Mime
View raw message