manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Everaert <reveatw...@gmail.com>
Subject Re: A less picky manifoldCF about solr errors?
Date Thu, 17 Oct 2013 12:47:21 GMT
We already do that. But, solr is still raising exception for some file
types, I have to wait for the customer to provide me the corresponding log
from solr and message received by the mcf job.


Regards,


Roland.


On Thu, Oct 17, 2013 at 2:41 PM, Karl Wright <daddywri@gmail.com> wrote:

> Ah, here it is:
>
> http://lucene.472066.n3.nabble.com/ignoreTikaException-value-td3645906.html
>
> Karl
>
>
>
> On Thu, Oct 17, 2013 at 8:39 AM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Roland,
>>
>> Usually 500 errors are from Tika (aka Solr Cell).  If that's what you are
>> seeing, there is a way to disable them.  I don't remember precisely what
>> you do, but it has been posted to this list (and others) so a google search
>> should find that for you.
>>
>> Thanks!
>> Karl
>>
>>
>>
>> On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert <reveatwork@gmail.com>wrote:
>>
>>> So far we had only to deal with HTTP code 500, because solr was not able
>>> to process some file types. We manage to tel solr to ignore tika exception.
>>> This helps us quite a lot, but solr as problem with processing some file
>>> types, and I have not yet find a way to tell solr to basically skip errors,
>>> while still logging them.
>>>
>>> I will check with the customer to get the error, but it was yesterday
>>> when it shows up and they have continued with the indexing (we are still at
>>> the initial indexing of the repository) and the logs with errors have
>>> disappeared.
>>>
>>>
>>> Thanks for your support,
>>>
>>>
>>> Roland.
>>>
>>>
>>>
>>> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Roland,
>>>>
>>>> It depends on what the error code is.  There is quite a bit of logic in
>>>> the Solr connector (and in ManifoldCF itself) for handling errors of
>>>> different kinds.  Fundamentally there are two main kinds of error condition
>>>> - one which causes a retry (and can, if so specified, cause either the
>>>> offending document to be skipped or the job aborted) and another which
>>>> always causes a job to abort.  The Solr connector has to decide based on
>>>> limited information exactly what to do.  General HTTP error codes such as
>>>> "500" errors, for example, contain little information and look just the
>>>> same whether the error represent a document Tika is unhappy with, or
>>>> something more fundamental, like a complete misconfiguration of Solr.
>>>>
>>>> If you can provide more detailed information as to the kind of error(s)
>>>> you are seeing then we can advise you further.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <reveatwork@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I helped a customer to deploy solr+manifoldcf to index files from a
>>>>> windows share drive. But every time solr is sending back an error message,
>>>>> the manifoldcf jobs abort, which is not really convenient for hour long
>>>>> indexing.
>>>>>
>>>>> So is there a possibility to configure manifold so it doesn't stopped
>>>>> every time solr return an http code different from 200?
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Roland.
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message