manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Apache ManifoldCF Performance
Date Wed, 10 Sep 2014 14:15:13 GMT
Hi Baptiste,

ManifoldCF is not limited by the number of agents processes or parallel
connectors.  Overall database performance is the limiting factor.

I would read this:

Also, there's a section in ManifoldCF (I believe Chapter 2) that discusses
this issue.

Some five years ago, I successfully crawled 5 million web documents, using
Postgresql 8.3.  Postgresql 9.x is faster, and with modern SSD's, I expect
that you will do even better.  In general, I'd say it was fine to shoot for
10M - 100M documents on ManifoldCF, provided that you use a good database,
and provided that you maintain it properly.


On Wed, Sep 10, 2014 at 10:07 AM, Baptiste Berthier <>

> Hi
> I would like to know what is the maximum number of documents that you
> managed to crawl with ManifoldCF and with how many connectors in parallel
> it could works ?
> Thanks for your answer
> Baptiste

View raw message