Hi Baptiste,

ManifoldCF is not limited by the number of agents processes or parallel connectors.  Overall database performance is the limiting factor.

I would read this:

http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html

Also, there's a section in ManifoldCF (I believe Chapter 2) that discusses this issue.

Some five years ago, I successfully crawled 5 million web documents, using Postgresql 8.3.  Postgresql 9.x is faster, and with modern SSD's, I expect that you will do even better.  In general, I'd say it was fine to shoot for 10M - 100M documents on ManifoldCF, provided that you use a good database, and provided that you maintain it properly.

Thanks,
Karl





On Wed, Sep 10, 2014 at 10:07 AM, Baptiste Berthier <ba.berthier@gmail.com> wrote:
Hi
 
I would like to know what is the maximum number of documents that you managed to crawl with ManifoldCF and with how many connectors in parallel it could works ?
 
Thanks for your answer
 
Baptiste