manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shigeki Kobayashi <shigeki.kobayas...@g.softbank.co.jp>
Subject Re: Changing logging level affect crawling results
Date Tue, 13 Nov 2012 11:03:27 GMT
Hi Karl.

Thanks for your reply.
I will try reducing the max connections.


Regards,


Shigeki

2012/11/13 Karl Wright <daddywri@gmail.com>

> I doubt this is related at all to the logging.  More likely it is
> related to the restart that you did when you changed the logging
> information.  The main possibility is that you changed the load
> pattern on the server.  Some Windows or NAS servers cannot handle
> load, and if there are too many open, active connections they will
> drop connections etc.  When that happens your big file is likely to be
> in progress, because it takes so long, and thus it gets aborted as a
> result of another transfer being aborted.  The CIFS protocol is
> vulnerable to this.  Solution: reduce the Max Connections parameter in
> ManifoldCF for that connection to something between 2 and 5.
>
> Karl
>
>
> On Tue, Nov 13, 2012 at 3:51 AM, Shigeki Kobayashi
> <shigeki.kobayashi3@g.softbank.co.jp> wrote:
> >
> > Hi Everyone.
> >
> > I have a question about logging levels.
> > Does changing logging level affect MCF's crawling results?
> >
> > While trying to crawl a big file (1.12GB) using Windows shares
> connection,
> > an error occurred, and MCF aborted.
> > At this time, all of the following logging levels were set as "INFO":
> >
> >     org.apache.manifoldcf.misc
> >     org.apache.manifoldcf.db
> >     org.apache.manifoldcf.lock
> >     org.apache.manifoldcf.cache
> >     org.apache.manifoldcf.agents
> >     org.apache.manifoldcf.perf
> >     org.apache.manifoldcf.crawlerthreads
> >     org.apache.manifoldcf.hopcount
> >     org.apache.manifoldcf.jobs
> >     org.apache.manifoldcf.connectors
> >     org.apache.manifoldcf.scheduling
> >     org.apache.manifoldcf.authorityconnectors
> >     org.apache.manifoldcf.authorityservice
> >
> >
> > Error message:
> > Error: Repeated service interruptions - failure processing document: Read
> > timed out
> >
> > However, changing only the following settings to "DEBUG" had MCF crawl
> the
> > file successfully.
> >
> >     org.apache.manifoldcf.agents
> >     org.apache.manifoldcf.crawlerthreads
> >     org.apache.manifoldcf.jobs
> >     org.apache.manifoldcf.connectors
> >
> >
> > What causes this difference, do you think?
> >
> >
> >  CentOS6(64bit)
> >  MySQL5.5
> >  Solr3.6
> >  MCF1.0:
> >   crawler.threads:300
> >   (Solr)Output connections:60
> >   Repository connections :60
> >
> >
> > Regards,
> >
> >
> > Shigeki
>

Mime
View raw message