manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Everaert <reveatw...@gmail.com>
Subject Re: A less picky manifoldCF about solr errors?
Date Fri, 18 Oct 2013 12:27:30 GMT
Hi,

My customer manage to reproduce the error here is the exception:

ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.
SolrException; null:java.lang.RuntimeException:
java.lang.NoSuchMethodError:
org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V


According to the solr mailing list, solr (or tika) is bundled with the
wrong version of a jar. The customer is currently testing with the new
version of the jar. I am waiting their result. I will open a JIRA issue.


Regards,


Roland.



On Thu, Oct 17, 2013 at 2:48 PM, Karl Wright <daddywri@gmail.com> wrote:

> Please let me know what the actual exception trace is.  Thanks!
> Karl
>
>
> On Thu, Oct 17, 2013 at 8:47 AM, Roland Everaert <reveatwork@gmail.com>wrote:
>
>> We already do that. But, solr is still raising exception for some file
>> types, I have to wait for the customer to provide me the corresponding log
>> from solr and message received by the mcf job.
>>
>>
>> Regards,
>>
>>
>> Roland.
>>
>>
>> On Thu, Oct 17, 2013 at 2:41 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Ah, here it is:
>>>
>>>
>>> http://lucene.472066.n3.nabble.com/ignoreTikaException-value-td3645906.html
>>>
>>> Karl
>>>
>>>
>>>
>>> On Thu, Oct 17, 2013 at 8:39 AM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Roland,
>>>>
>>>> Usually 500 errors are from Tika (aka Solr Cell).  If that's what you
>>>> are seeing, there is a way to disable them.  I don't remember precisely
>>>> what you do, but it has been posted to this list (and others) so a google
>>>> search should find that for you.
>>>>
>>>> Thanks!
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert <reveatwork@gmail.com>wrote:
>>>>
>>>>> So far we had only to deal with HTTP code 500, because solr was not
>>>>> able to process some file types. We manage to tel solr to ignore tika
>>>>> exception. This helps us quite a lot, but solr as problem with processing
>>>>> some file types, and I have not yet find a way to tell solr to basically
>>>>> skip errors, while still logging them.
>>>>>
>>>>> I will check with the customer to get the error, but it was yesterday
>>>>> when it shows up and they have continued with the indexing (we are still
at
>>>>> the initial indexing of the repository) and the logs with errors have
>>>>> disappeared.
>>>>>
>>>>>
>>>>> Thanks for your support,
>>>>>
>>>>>
>>>>> Roland.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>
>>>>>> Hi Roland,
>>>>>>
>>>>>> It depends on what the error code is.  There is quite a bit of logic
>>>>>> in the Solr connector (and in ManifoldCF itself) for handling errors
of
>>>>>> different kinds.  Fundamentally there are two main kinds of error
condition
>>>>>> - one which causes a retry (and can, if so specified, cause either
the
>>>>>> offending document to be skipped or the job aborted) and another
which
>>>>>> always causes a job to abort.  The Solr connector has to decide based
on
>>>>>> limited information exactly what to do.  General HTTP error codes
such as
>>>>>> "500" errors, for example, contain little information and look just
the
>>>>>> same whether the error represent a document Tika is unhappy with,
or
>>>>>> something more fundamental, like a complete misconfiguration of Solr.
>>>>>>
>>>>>> If you can provide more detailed information as to the kind of
>>>>>> error(s) you are seeing then we can advise you further.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <
>>>>>> reveatwork@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I helped a customer to deploy solr+manifoldcf to index files
from a
>>>>>>> windows share drive. But every time solr is sending back an error
message,
>>>>>>> the manifoldcf jobs abort, which is not really convenient for
hour long
>>>>>>> indexing.
>>>>>>>
>>>>>>> So is there a possibility to configure manifold so it doesn't
>>>>>>> stopped every time solr return an http code different from 200?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>> Roland.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message