manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: A less picky manifoldCF about solr errors?
Date Thu, 17 Oct 2013 12:48:41 GMT
Please let me know what the actual exception trace is.  Thanks!
Karl


On Thu, Oct 17, 2013 at 8:47 AM, Roland Everaert <reveatwork@gmail.com>wrote:

> We already do that. But, solr is still raising exception for some file
> types, I have to wait for the customer to provide me the corresponding log
> from solr and message received by the mcf job.
>
>
> Regards,
>
>
> Roland.
>
>
> On Thu, Oct 17, 2013 at 2:41 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Ah, here it is:
>>
>>
>> http://lucene.472066.n3.nabble.com/ignoreTikaException-value-td3645906.html
>>
>> Karl
>>
>>
>>
>> On Thu, Oct 17, 2013 at 8:39 AM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Roland,
>>>
>>> Usually 500 errors are from Tika (aka Solr Cell).  If that's what you
>>> are seeing, there is a way to disable them.  I don't remember precisely
>>> what you do, but it has been posted to this list (and others) so a google
>>> search should find that for you.
>>>
>>> Thanks!
>>> Karl
>>>
>>>
>>>
>>> On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert <reveatwork@gmail.com>wrote:
>>>
>>>> So far we had only to deal with HTTP code 500, because solr was not
>>>> able to process some file types. We manage to tel solr to ignore tika
>>>> exception. This helps us quite a lot, but solr as problem with processing
>>>> some file types, and I have not yet find a way to tell solr to basically
>>>> skip errors, while still logging them.
>>>>
>>>> I will check with the customer to get the error, but it was yesterday
>>>> when it shows up and they have continued with the indexing (we are still
at
>>>> the initial indexing of the repository) and the logs with errors have
>>>> disappeared.
>>>>
>>>>
>>>> Thanks for your support,
>>>>
>>>>
>>>> Roland.
>>>>
>>>>
>>>>
>>>> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>
>>>>> Hi Roland,
>>>>>
>>>>> It depends on what the error code is.  There is quite a bit of logic
>>>>> in the Solr connector (and in ManifoldCF itself) for handling errors
of
>>>>> different kinds.  Fundamentally there are two main kinds of error condition
>>>>> - one which causes a retry (and can, if so specified, cause either the
>>>>> offending document to be skipped or the job aborted) and another which
>>>>> always causes a job to abort.  The Solr connector has to decide based
on
>>>>> limited information exactly what to do.  General HTTP error codes such
as
>>>>> "500" errors, for example, contain little information and look just the
>>>>> same whether the error represent a document Tika is unhappy with, or
>>>>> something more fundamental, like a complete misconfiguration of Solr.
>>>>>
>>>>> If you can provide more detailed information as to the kind of
>>>>> error(s) you are seeing then we can advise you further.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <reveatwork@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I helped a customer to deploy solr+manifoldcf to index files from
a
>>>>>> windows share drive. But every time solr is sending back an error
message,
>>>>>> the manifoldcf jobs abort, which is not really convenient for hour
long
>>>>>> indexing.
>>>>>>
>>>>>> So is there a possibility to configure manifold so it doesn't stopped
>>>>>> every time solr return an http code different from 200?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>> Roland.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message