manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Slow performance with a basic setup
Date Wed, 28 Mar 2012 00:37:16 GMT
Let's start with some basics.
First of all, how many web connections do you have configured?  What
do you have for throttling?  If you have not modified the default
settings for throttling and are pulling a number of documents off of
ONE server, then throttling is probably severely limiting your crawl
speed.

Karl

On Tue, Mar 27, 2012 at 6:24 PM, Scott Schneider <scottsch42@gmail.com> wrote:
> Hi all,
>
> I have a pretty simple ManifoldCF setup, but I'm getting very slow
> performance.  Can someone help me understand and/or fix this?
>
> My input is a web connector that goes to an Apache HTTP server running
> on the local machine, serving static text files.  I have a null
> authority service.  I output to Solr, also running locally.
>
> The data I'm crawling is ~20 MB total in ~8,500 small files.  I start
> the job one afternoon and the next morning, it was not finished!  It
> had only processed ~2,500 documents.  Strangely, it listed ~10,000
> total documents (and ~7,500 active).
>
> My ultimate goal is to figure out how much space the Solr index takes
> as I add more access tokens.  That's why I'm using the web connector
> and null authority, rather than just using a file system connector.
>
> Thanks,
> Scott

Mime
View raw message