Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BD25BD584 for ; Tue, 13 Nov 2012 11:04:39 +0000 (UTC) Received: (qmail 95793 invoked by uid 500); 13 Nov 2012 11:04:39 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 95643 invoked by uid 500); 13 Nov 2012 11:04:38 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 95604 invoked by uid 99); 13 Nov 2012 11:04:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Nov 2012 11:04:37 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shigeki.kobayashi3@g.softbank.co.jp designates 74.125.245.96 as permitted sender) Received: from [74.125.245.96] (HELO na3sys010aog114.obsmtp.com) (74.125.245.96) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 13 Nov 2012 11:04:30 +0000 Received: from mail-pa0-f50.google.com ([209.85.220.50]) (using TLSv1) by na3sys010aob114.postini.com ([74.125.244.12]) with SMTP ID DSNKUKIpKCdTxqP1D93butUkzkq6wZcWFS1r@postini.com; Tue, 13 Nov 2012 03:04:09 PST Received: by mail-pa0-f50.google.com with SMTP id hz11so4557717pad.9 for ; Tue, 13 Nov 2012 03:04:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=SyNwbiEcfq4mB9dmWAhb43BKxD6xeWw2OpHfs3UYVmE=; b=lf+bYeWj7zVne8grbjang+1G2Br2q1N/VWboTrAjeQ8peKKJAiZbcH3AlW5UQmyhH8 b0diaorisBP4jGvxgxqaDglUC8SSLHERcOPWB07n87Coo5ETG1xPDix+HU3Zky7a1tqr WhbQgknZPxcH9IIap2Zmx+oAKsogt9TjCyXKNIq8A2k64f5VlOjuFSVFHXbL8UcvZIZt +ESa3rHFDcl1LSp+1Pknb6i6z2NjyEaWofCe5Ii+ew+YjVr7CNK0fP3RFQqdtTCa4Gth T4ykcxLSxZbdZv1h1N7IfxXVWw+7B8itWUGc9dJyG93CZ7yCUyreFXZNu2HsNtIhHNLy eL7g== Received: by 10.68.253.230 with SMTP id ad6mr60446846pbd.84.1352804647799; Tue, 13 Nov 2012 03:04:07 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.225.38 with HTTP; Tue, 13 Nov 2012 03:03:27 -0800 (PST) In-Reply-To: References: From: Shigeki Kobayashi Date: Tue, 13 Nov 2012 20:03:27 +0900 Message-ID: Subject: Re: Changing logging level affect crawling results To: user@manifoldcf.apache.org Content-Type: multipart/alternative; boundary=047d7b2e0bff7b89db04ce5e6003 X-Gm-Message-State: ALoCoQlkAaMG39TtyowwqgWcCT2MC4uQjjpwQ7Xa+AIZ1NxgAJ+btIDehO5UUONKtT1NMPHDMxHD X-Virus-Checked: Checked by ClamAV on apache.org --047d7b2e0bff7b89db04ce5e6003 Content-Type: text/plain; charset=UTF-8 Hi Karl. Thanks for your reply. I will try reducing the max connections. Regards, Shigeki 2012/11/13 Karl Wright > I doubt this is related at all to the logging. More likely it is > related to the restart that you did when you changed the logging > information. The main possibility is that you changed the load > pattern on the server. Some Windows or NAS servers cannot handle > load, and if there are too many open, active connections they will > drop connections etc. When that happens your big file is likely to be > in progress, because it takes so long, and thus it gets aborted as a > result of another transfer being aborted. The CIFS protocol is > vulnerable to this. Solution: reduce the Max Connections parameter in > ManifoldCF for that connection to something between 2 and 5. > > Karl > > > On Tue, Nov 13, 2012 at 3:51 AM, Shigeki Kobayashi > wrote: > > > > Hi Everyone. > > > > I have a question about logging levels. > > Does changing logging level affect MCF's crawling results? > > > > While trying to crawl a big file (1.12GB) using Windows shares > connection, > > an error occurred, and MCF aborted. > > At this time, all of the following logging levels were set as "INFO": > > > > org.apache.manifoldcf.misc > > org.apache.manifoldcf.db > > org.apache.manifoldcf.lock > > org.apache.manifoldcf.cache > > org.apache.manifoldcf.agents > > org.apache.manifoldcf.perf > > org.apache.manifoldcf.crawlerthreads > > org.apache.manifoldcf.hopcount > > org.apache.manifoldcf.jobs > > org.apache.manifoldcf.connectors > > org.apache.manifoldcf.scheduling > > org.apache.manifoldcf.authorityconnectors > > org.apache.manifoldcf.authorityservice > > > > > > Error message: > > Error: Repeated service interruptions - failure processing document: Read > > timed out > > > > However, changing only the following settings to "DEBUG" had MCF crawl > the > > file successfully. > > > > org.apache.manifoldcf.agents > > org.apache.manifoldcf.crawlerthreads > > org.apache.manifoldcf.jobs > > org.apache.manifoldcf.connectors > > > > > > What causes this difference, do you think? > > > > > > CentOS6(64bit) > > MySQL5.5 > > Solr3.6 > > MCF1.0: > > crawler.threads:300 > > (Solr)Output connections:60 > > Repository connections :60 > > > > > > Regards, > > > > > > Shigeki > --047d7b2e0bff7b89db04ce5e6003 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Karl.

Thanks for your reply.=C2=A0
I will t= ry reducing the max connections.


Re= gards,


Shigeki

2012/11/13 Karl Wright <daddywri@gmail.com>
I doubt this is related at all to the logging. =C2=A0More likely it is
related to the restart that you did when you changed the logging
information. =C2=A0The main possibility is that you changed the load
pattern on the server. =C2=A0Some Windows or NAS servers cannot handle
load, and if there are too many open, active connections they will
drop connections etc. =C2=A0When that happens your big file is likely to be=
in progress, because it takes so long, and thus it gets aborted as a
result of another transfer being aborted. =C2=A0The CIFS protocol is
vulnerable to this. =C2=A0Solution: reduce the Max Connections parameter in=
ManifoldCF for that connection to something between 2 and 5.

Karl


On Tue, Nov 13, 2012 at 3:51 AM, Shigeki Kobayashi
<shigeki.kobayash= i3@g.softbank.co.jp> wrote:
>
> Hi Everyone.
>
> I have a question about logging levels.
> Does changing logging level affect MCF's crawling results?
>
> While trying to crawl a big file (1.12GB) using Windows shares connect= ion,
> an error occurred, and MCF aborted.
> At this time, all of the following logging levels were set as "IN= FO":
>
> =C2=A0 =C2=A0 org.apache.manifoldcf.misc
> =C2=A0 =C2=A0 org.apache.manifoldcf.db
> =C2=A0 =C2=A0 org.apache.manifoldcf.lock
> =C2=A0 =C2=A0 org.apache.manifoldcf.cache
> =C2=A0 =C2=A0 org.apache.manifoldcf.agents
> =C2=A0 =C2=A0 org.apache.manifoldcf.perf
> =C2=A0 =C2=A0 org.apache.manifoldcf.crawlerthreads
> =C2=A0 =C2=A0 org.apache.manifoldcf.hopcount
> =C2=A0 =C2=A0 org.apache.manifoldcf.jobs
> =C2=A0 =C2=A0 org.apache.manifoldcf.connectors
> =C2=A0 =C2=A0 org.apache.manifoldcf.scheduling
> =C2=A0 =C2=A0 org.apache.manifoldcf.authorityconnectors
> =C2=A0 =C2=A0 org.apache.manifoldcf.authorityservice
>
>
> Error message:
> Error: Repeated service interruptions - failure processing document: R= ead
> timed out
>
> However, changing only the following settings to "DEBUG" had= MCF crawl the
> file successfully.
>
> =C2=A0 =C2=A0 org.apache.manifoldcf.agents
> =C2=A0 =C2=A0 org.apache.manifoldcf.crawlerthreads
> =C2=A0 =C2=A0 org.apache.manifoldcf.jobs
> =C2=A0 =C2=A0 org.apache.manifoldcf.connectors
>
>
> What causes this difference, do you think?
>
>
> =C2=A0CentOS6(64bit)
> =C2=A0MySQL5.5
> =C2=A0Solr3.6
> =C2=A0MCF1.0:
> =C2=A0 crawler.threads:300
> =C2=A0 (Solr)Output connections:60
> =C2=A0 Repository connections :60
>
>
> Regards,
>
>
> Shigeki



--047d7b2e0bff7b89db04ce5e6003--