Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@manifoldcf.apache.org
Received-SPF: pass (nike.apache.org: domain of
 shigeki.kobayashi3@g.softbank.co.jp designates 74.125.245.96 as permitted
 sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALUFAGC7vo9aHM9FJYWVRvqcdDKVas_5EBxq2BBZg+aZiFMhDA@mail.gmail.com>
References: 
 <CAGXi7oNctaERWNs46+M8YD34+P4cuQMyo-YcdfVzHS+KDqZEQA@mail.gmail.com>
 <CALUFAGC7vo9aHM9FJYWVRvqcdDKVas_5EBxq2BBZg+aZiFMhDA@mail.gmail.com>
From: Shigeki Kobayashi <shigeki.kobayashi3@g.softbank.co.jp>
Date: Tue, 13 Nov 2012 20:03:27 +0900
Message-ID: 
 <CAGXi7oPEMKuf0ZKi9JQdsj56chGC3Ys6ViKkCTs35FPkHt6EUg@mail.gmail.com>
Subject: Re: Changing logging level affect crawling results
To: user@manifoldcf.apache.org
Content-Type: multipart/alternative; boundary=047d7b2e0bff7b89db04ce5e6003

--047d7b2e0bff7b89db04ce5e6003
Content-Type: text/plain; charset=UTF-8

Hi Karl.

Thanks for your reply.
I will try reducing the max connections.


Regards,


Shigeki

2012/11/13 Karl Wright <daddywri@gmail.com>

> I doubt this is related at all to the logging.  More likely it is
> related to the restart that you did when you changed the logging
> information.  The main possibility is that you changed the load
> pattern on the server.  Some Windows or NAS servers cannot handle
> load, and if there are too many open, active connections they will
> drop connections etc.  When that happens your big file is likely to be
> in progress, because it takes so long, and thus it gets aborted as a
> result of another transfer being aborted.  The CIFS protocol is
> vulnerable to this.  Solution: reduce the Max Connections parameter in
> ManifoldCF for that connection to something between 2 and 5.
>
> Karl
>
>
> On Tue, Nov 13, 2012 at 3:51 AM, Shigeki Kobayashi
> <shigeki.kobayashi3@g.softbank.co.jp> wrote:
> >
> > Hi Everyone.
> >
> > I have a question about logging levels.
> > Does changing logging level affect MCF's crawling results?
> >
> > While trying to crawl a big file (1.12GB) using Windows shares
> connection,
> > an error occurred, and MCF aborted.
> > At this time, all of the following logging levels were set as "INFO":
> >
> >     org.apache.manifoldcf.misc
> >     org.apache.manifoldcf.db
> >     org.apache.manifoldcf.lock
> >     org.apache.manifoldcf.cache
> >     org.apache.manifoldcf.agents
> >     org.apache.manifoldcf.perf
> >     org.apache.manifoldcf.crawlerthreads
> >     org.apache.manifoldcf.hopcount
> >     org.apache.manifoldcf.jobs
> >     org.apache.manifoldcf.connectors
> >     org.apache.manifoldcf.scheduling
> >     org.apache.manifoldcf.authorityconnectors
> >     org.apache.manifoldcf.authorityservice
> >
> >
> > Error message:
> > Error: Repeated service interruptions - failure processing document: Read
> > timed out
> >
> > However, changing only the following settings to "DEBUG" had MCF crawl
> the
> > file successfully.
> >
> >     org.apache.manifoldcf.agents
> >     org.apache.manifoldcf.crawlerthreads
> >     org.apache.manifoldcf.jobs
> >     org.apache.manifoldcf.connectors
> >
> >
> > What causes this difference, do you think?
> >
> >
> >  CentOS6(64bit)
> >  MySQL5.5
> >  Solr3.6
> >  MCF1.0:
> >   crawler.threads:300
> >   (Solr)Output connections:60
> >   Repository connections :60
> >
> >
> > Regards,
> >
> >
> > Shigeki
>

--047d7b2e0bff7b89db04ce5e6003
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Karl.<div><br></div><div>Thanks for your reply.=C2=A0</div><div>I will t=
ry reducing the max connections.</div><div><br></div><div><br></div><div>Re=
gards,</div><div><br></div><div><br></div><div>Shigeki<br><br><div class=3D=
"gmail_quote">

2012/11/13 Karl Wright <span dir=3D"ltr">&lt;<a href=3D"mailto:daddywri@gma=
il.com" target=3D"_blank">daddywri@gmail.com</a>&gt;</span><br><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex">

I doubt this is related at all to the logging. =C2=A0More likely it is<br>
related to the restart that you did when you changed the logging<br>
information. =C2=A0The main possibility is that you changed the load<br>
pattern on the server. =C2=A0Some Windows or NAS servers cannot handle<br>
load, and if there are too many open, active connections they will<br>
drop connections etc. =C2=A0When that happens your big file is likely to be=
<br>
in progress, because it takes so long, and thus it gets aborted as a<br>
result of another transfer being aborted. =C2=A0The CIFS protocol is<br>
vulnerable to this. =C2=A0Solution: reduce the Max Connections parameter in=
<br>
ManifoldCF for that connection to something between 2 and 5.<br>
<br>
Karl<br>
<br>
<br>
On Tue, Nov 13, 2012 at 3:51 AM, Shigeki Kobayashi<br>
&lt;<a href=3D"mailto:shigeki.kobayashi3@g.softbank.co.jp">shigeki.kobayash=
i3@g.softbank.co.jp</a>&gt; wrote:<br>
&gt;<br>
&gt; Hi Everyone.<br>
&gt;<br>
&gt; I have a question about logging levels.<br>
&gt; Does changing logging level affect MCF&#39;s crawling results?<br>
&gt;<br>
&gt; While trying to crawl a big file (1.12GB) using Windows shares connect=
ion,<br>
&gt; an error occurred, and MCF aborted.<br>
&gt; At this time, all of the following logging levels were set as &quot;IN=
FO&quot;:<br>
&gt;<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.misc<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.db<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.lock<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.cache<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.agents<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.perf<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.crawlerthreads<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.hopcount<br>
&gt; =C2=A0 =C2=A0 <a href=3D"http://org.apache.manifoldcf.jobs" target=3D"=
_blank">org.apache.manifoldcf.jobs</a><br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.connectors<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.scheduling<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.authorityconnectors<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.authorityservice<br>
&gt;<br>
&gt;<br>
&gt; Error message:<br>
&gt; Error: Repeated service interruptions - failure processing document: R=
ead<br>
&gt; timed out<br>
&gt;<br>
&gt; However, changing only the following settings to &quot;DEBUG&quot; had=
 MCF crawl the<br>
&gt; file successfully.<br>
&gt;<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.agents<br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.crawlerthreads<br>
&gt; =C2=A0 =C2=A0 <a href=3D"http://org.apache.manifoldcf.jobs" target=3D"=
_blank">org.apache.manifoldcf.jobs</a><br>
&gt; =C2=A0 =C2=A0 org.apache.manifoldcf.connectors<br>
&gt;<br>
&gt;<br>
&gt; What causes this difference, do you think?<br>
&gt;<br>
&gt;<br>
&gt; =C2=A0CentOS6(64bit)<br>
&gt; =C2=A0MySQL5.5<br>
&gt; =C2=A0Solr3.6<br>
&gt; =C2=A0MCF1.0:<br>
&gt; =C2=A0 crawler.threads:300<br>
&gt; =C2=A0 (Solr)Output connections:60<br>
&gt; =C2=A0 Repository connections :60<br>
&gt;<br>
&gt;<br>
&gt; Regards,<br>
&gt;<br>
&gt;<br>
&gt; Shigeki<br>
</blockquote></div><br><br clear=3D"all"><div><br></div>
</div>

--047d7b2e0bff7b89db04ce5e6003--