manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Crawling and indexing very slow
Date Thu, 31 Jul 2014 18:13:13 GMT
Hi Ameya,

(1) Please look at the Simple History report.  Note what kinds of documents
are being fetched, what kinds are being indexed, and how long it is
taking.  I have noted from your previous posts that you seem to be indexing
a lot of very large EXE files.  This is useless and you should be excluding
them.

(2) Please look in the manifoldcf.log file for evidence that fetches and/or
Solr indexing requests are being retried due to errors.  It doesn't take
many documents being chronically retried before forward progress drops to
near zero.

(3) If you look into (1) & (2) and everything seems fine, it may be a
misalignment between availability of several kinds of resources that is the
problem.  Please get a thread dump of the agents process while it is
crawling, using jstack.  Post that thread dump and we can tell you what to
look at next.

Karl



On Thu, Jul 31, 2014 at 2:07 PM, Ameya Aware <ameya.aware@gmail.com> wrote:

> Hi,
>
>
> I am using filesystem connector to index my entire C drive using Solr as
> output connector.
>
> Initial 100000 documents were crawled and indexed successfully in couple
> of hours but after that indexing slowed down badly (around 15-20 documents
> per min).
>
>
> I am not able to figure out whether there is issue with MCF or Solr.
>
>
> Can you advice me how to proceed with this?
>
>
> Thanks,
> Ameya
>

Mime
View raw message