manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Apache ManifoldCF Performance
Date Wed, 10 Sep 2014 14:15:13 GMT
Hi Baptiste,

ManifoldCF is not limited by the number of agents processes or parallel
connectors.  Overall database performance is the limiting factor.

I would read this:

http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html

Also, there's a section in ManifoldCF (I believe Chapter 2) that discusses
this issue.

Some five years ago, I successfully crawled 5 million web documents, using
Postgresql 8.3.  Postgresql 9.x is faster, and with modern SSD's, I expect
that you will do even better.  In general, I'd say it was fine to shoot for
10M - 100M documents on ManifoldCF, provided that you use a good database,
and provided that you maintain it properly.

Thanks,
Karl





On Wed, Sep 10, 2014 at 10:07 AM, Baptiste Berthier <ba.berthier@gmail.com>
wrote:

> Hi
>
> I would like to know what is the maximum number of documents that you
> managed to crawl with ManifoldCF and with how many connectors in parallel
> it could works ?
>
> Thanks for your answer
>
> Baptiste
>

Mime
View raw message