Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70CEA10A55 for ; Fri, 18 Oct 2013 12:28:08 +0000 (UTC) Received: (qmail 34689 invoked by uid 500); 18 Oct 2013 12:28:07 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 34531 invoked by uid 500); 18 Oct 2013 12:28:00 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 34507 invoked by uid 99); 18 Oct 2013 12:27:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Oct 2013 12:27:58 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of reveatwork@gmail.com designates 209.85.219.48 as permitted sender) Received: from [209.85.219.48] (HELO mail-oa0-f48.google.com) (209.85.219.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Oct 2013 12:27:52 +0000 Received: by mail-oa0-f48.google.com with SMTP id m17so2663716oag.7 for ; Fri, 18 Oct 2013 05:27:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=946wYlyCLuGQMPhar/Gaa2LVcYDWnQiTpgqXXgvCVp8=; b=r7vHxodz6gYWr2g+Hp0f2yvoUiX9KJTIJMIOwAq5HmYLAImC5BCuWfDhmf320q38Ja r/r/7UmBj00o+I85hFIzydxR47WIzQOXbLZAi2aKJ5RVc4KNfzkjXU6FT2LOhyQHkPs2 CSVU4j/26t0cyTtNMNEVfM/UmaWr+1J3B5rEMp+k2OvOBfZlomqWoh1RHcZx4MYT3REk jk8CPR68qDhqQLMT18uEboKtIm9gZQU2lZlfaK48weLlAgADKg3AAu9ryfW5ZK4H4plt IbkJYOejtmggqJuXgfuj+EDNa9LleBiyA/9IdmZl65A3/O+taS34f1GQw3CUBGa218H4 cb2Q== MIME-Version: 1.0 X-Received: by 10.182.230.135 with SMTP id sy7mr4286917obc.24.1382099250583; Fri, 18 Oct 2013 05:27:30 -0700 (PDT) Received: by 10.76.151.41 with HTTP; Fri, 18 Oct 2013 05:27:30 -0700 (PDT) In-Reply-To: References: Date: Fri, 18 Oct 2013 14:27:30 +0200 Message-ID: Subject: Re: A less picky manifoldCF about solr errors? From: Roland Everaert To: user@manifoldcf.apache.org Content-Type: multipart/alternative; boundary=001a11c33676e01ec104e9030eed X-Virus-Checked: Checked by ClamAV on apache.org --001a11c33676e01ec104e9030eed Content-Type: text/plain; charset=ISO-8859-1 Hi, My customer manage to reproduce the error here is the exception: ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common. SolrException; null:java.lang.RuntimeException: java.lang.NoSuchMethodError: org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V According to the solr mailing list, solr (or tika) is bundled with the wrong version of a jar. The customer is currently testing with the new version of the jar. I am waiting their result. I will open a JIRA issue. Regards, Roland. On Thu, Oct 17, 2013 at 2:48 PM, Karl Wright wrote: > Please let me know what the actual exception trace is. Thanks! > Karl > > > On Thu, Oct 17, 2013 at 8:47 AM, Roland Everaert wrote: > >> We already do that. But, solr is still raising exception for some file >> types, I have to wait for the customer to provide me the corresponding log >> from solr and message received by the mcf job. >> >> >> Regards, >> >> >> Roland. >> >> >> On Thu, Oct 17, 2013 at 2:41 PM, Karl Wright wrote: >> >>> Ah, here it is: >>> >>> >>> http://lucene.472066.n3.nabble.com/ignoreTikaException-value-td3645906.html >>> >>> Karl >>> >>> >>> >>> On Thu, Oct 17, 2013 at 8:39 AM, Karl Wright wrote: >>> >>>> Hi Roland, >>>> >>>> Usually 500 errors are from Tika (aka Solr Cell). If that's what you >>>> are seeing, there is a way to disable them. I don't remember precisely >>>> what you do, but it has been posted to this list (and others) so a google >>>> search should find that for you. >>>> >>>> Thanks! >>>> Karl >>>> >>>> >>>> >>>> On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert wrote: >>>> >>>>> So far we had only to deal with HTTP code 500, because solr was not >>>>> able to process some file types. We manage to tel solr to ignore tika >>>>> exception. This helps us quite a lot, but solr as problem with processing >>>>> some file types, and I have not yet find a way to tell solr to basically >>>>> skip errors, while still logging them. >>>>> >>>>> I will check with the customer to get the error, but it was yesterday >>>>> when it shows up and they have continued with the indexing (we are still at >>>>> the initial indexing of the repository) and the logs with errors have >>>>> disappeared. >>>>> >>>>> >>>>> Thanks for your support, >>>>> >>>>> >>>>> Roland. >>>>> >>>>> >>>>> >>>>> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright wrote: >>>>> >>>>>> Hi Roland, >>>>>> >>>>>> It depends on what the error code is. There is quite a bit of logic >>>>>> in the Solr connector (and in ManifoldCF itself) for handling errors of >>>>>> different kinds. Fundamentally there are two main kinds of error condition >>>>>> - one which causes a retry (and can, if so specified, cause either the >>>>>> offending document to be skipped or the job aborted) and another which >>>>>> always causes a job to abort. The Solr connector has to decide based on >>>>>> limited information exactly what to do. General HTTP error codes such as >>>>>> "500" errors, for example, contain little information and look just the >>>>>> same whether the error represent a document Tika is unhappy with, or >>>>>> something more fundamental, like a complete misconfiguration of Solr. >>>>>> >>>>>> If you can provide more detailed information as to the kind of >>>>>> error(s) you are seeing then we can advise you further. >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert < >>>>>> reveatwork@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I helped a customer to deploy solr+manifoldcf to index files from a >>>>>>> windows share drive. But every time solr is sending back an error message, >>>>>>> the manifoldcf jobs abort, which is not really convenient for hour long >>>>>>> indexing. >>>>>>> >>>>>>> So is there a possibility to configure manifold so it doesn't >>>>>>> stopped every time solr return an http code different from 200? >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> >>>>>>> Roland. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > --001a11c33676e01ec104e9030eed Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,

My customer manage to reproduce the = error here is the exception:

ERROR - 2013-10-17 18:13:48.902; org.ap= ache.solr.common.
SolrException; null:java.lang.RuntimeException: java.= lang.NoSuchMethodError: org.apache.commons.compress.compressors.CompressorS= treamFactory.setDecompressConcatenated(Z)V


According to the solr mailing list, solr (or tika) is bu= ndled with the wrong version of a jar. The customer is currently testing wi= th the new version of the jar. I am waiting their result. I will open a JIR= A issue.


Regards,


Roland.



On Thu, Oct= 17, 2013 at 2:48 PM, Karl Wright <daddywri@gmail.com> wrot= e:
Please let me know what the= actual exception trace is.=A0 Thanks!
Karl


On Thu, Oct 17, 2013= at 8:47 AM, Roland Everaert <reveatwork@gmail.com> wrote= :
We already do tha= t. But, solr is still raising exception for some file types, I have to wait= for the customer to provide me the corresponding log from solr and message= received by the mcf job.


Regards,


Roland.


On Thu, Oct 17, 2013 at= 2:41 PM, Karl Wright <daddywri@gmail.com> wrote:

On Thu, Oct 17, 2013 at 8:39 AM, Karl Wright <= span dir=3D"ltr"><daddywri@gmail.com> wrote:
Hi Roland,

Usually 500 errors are from Tika (aka Solr Cell).=A0 If that's what y= ou are seeing, there is a way to disable them.=A0 I don't remember prec= isely what you do, but it has been posted to this list (and others) so a go= ogle search should find that for you.

Thanks!
Karl





On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <daddyw= ri@gmail.com> wrote:
Hi Roland,

It d= epends on what the error code is.=A0 There is quite a bit of logic in the S= olr connector (and in ManifoldCF itself) for handling errors of different k= inds.=A0 Fundamentally there are two main kinds of error condition - one wh= ich causes a retry (and can, if so specified, cause either the offending do= cument to be skipped or the job aborted) and another which always causes a = job to abort.=A0 The Solr connector has to decide based on limited informat= ion exactly what to do.=A0 General HTTP error codes such as "500"= errors, for example, contain little information and look just the same whe= ther the error represent a document Tika is unhappy with, or something more= fundamental, like a complete misconfiguration of Solr.

If you can provide more detailed information as to the kind of error(s)= you are seeing then we can advise you further.

Karl



On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <= reveatwork@gmail.com> wrote:
Hi,
=
I helped a customer to deploy solr+manifoldcf to index files from= a windows share drive. But every time solr is sending back an error messag= e, the manifoldcf jobs abort, which is not really convenient for hour long = indexing.

So is there a possibility to configure manifold so it doesn't= stopped every time solr return an http code different from 200?

Thanks,


Roland.







--001a11c33676e01ec104e9030eed--