manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Performance benchmarking
Date Tue, 21 Feb 2017 06:59:52 GMT
For a 100,000 local document crawl (small files, which is the worst for
database performance), the first 10,000 documents clock in at 92 documents
per second here.  That's with an out-of-the-box Windows postgresql 9.3 on a
quad core processor with SSD.  The performance drops off from that pace; at
26K documents we're down to 55 docs/second.  But that's where it bottoms
out; at 36K we see 57 docs/second, and at 51K we see 65 docs per second.

The slowdown is as a result of the Postgresql tables getting larger and no
longer fitting in memory.  The speedup is due to no further document
discovery occurring and just plain crawling taking place.

Probably it is the 4-core architecture that makes the difference between
your results and mine.  My 2-core machine without SSDs is what I
benchmarked before and that's the one that runs some 20 docs/second.  (I
misremembed about the SSDs, sorry).

Thanks,
Karl


On Tue, Feb 21, 2017 at 1:20 AM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Vinodh,
>
> ManifoldCF doc per second performance is limited mainly by database
> performance.  Postgresql 9.x on Windows seems a bit slower than Postgresql
> 8.x was; I'd typically get 25 documents per second on 8X even without SSDs,
> but with Postgresql 9.x on Windows the numbers are a good deal poorer.
> Still, I haven't seen it quite that bad; 20 docs per second is what I
> recall seeing in informal testing here.
>
> I haven't tried tuning for database performance on Windows ever.  Linux
> Postgresql deployments do a lot better, in my experience, and are probably
> more responsive to tuning.
>
> I'll do some experiments, as time permits, and get back to you.
>
> Karl
>
>
> On Tue, Feb 21, 2017 at 12:18 AM, Vinodh Boopalan <
> vinodh.boopalan@contentdiv.com> wrote:
>
>> Forgot to mention, currently focusing on Windows Share/Local File system.
>>
>> On 2/21/17, 12:16 AM, "Vinodh Boopalan" <vinodh.boopalan@contentdiv.com>
>> wrote:
>>
>>     Hi Karl/Team,
>>
>>     I am trying to benchmark ManifoldCF performance using the following
>> configuration,
>>
>>     Host Specs
>>
>>     VM running in ESXi
>>     Memory: 8GB
>>     CPU: 2 with 2 cores
>>     SSD
>>
>>     Running MCF in combined single process (1024MB JVM, 100 workers, 105
>> db connections) . Postgresql setting are mostly per the link,
>>     https://manifoldcf.apache.org/release/release-2.5/en_US/perf
>> ormance-tuning.html
>>
>>     I have around 2300 docs which is around 3.5 GB in total file size.  I
>> am getting a throughput of (null output) 13-14 docs/sec (18 docs/sec – best
>> I have seen).
>>
>>     Is that a reasonable performance to expect?
>>
>>     Can I scale further by using the multiprocess deployment?  (have
>> multiple agents against the postgresql). Or should I explore scaling
>> Postgres?
>>
>>     Thanks and best regards,
>>     Vinodh
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message