Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E3C5719094 for ; Wed, 27 Apr 2016 11:43:56 +0000 (UTC) Received: (qmail 19304 invoked by uid 500); 27 Apr 2016 11:43:56 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 19252 invoked by uid 500); 27 Apr 2016 11:43:56 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 19242 invoked by uid 99); 27 Apr 2016 11:43:56 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Apr 2016 11:43:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 542B31A1233 for ; Wed, 27 Apr 2016 11:43:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id o8pJUTC8dtFD for ; Wed, 27 Apr 2016 11:43:52 +0000 (UTC) Received: from mail-io0-f174.google.com (mail-io0-f174.google.com [209.85.223.174]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6CA885F19A for ; Wed, 27 Apr 2016 11:43:52 +0000 (UTC) Received: by mail-io0-f174.google.com with SMTP id f89so43396952ioi.0 for ; Wed, 27 Apr 2016 04:43:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=+vzFqYPmpC2kGojLrVQnc69Ao3VqCvYBpWjAwUn+A8o=; b=0PERXghIu5YUkojJeWrnUAHemzr5LkPIOUQVs0rRxBn62phaTQHHDk5eER1Lzn9eYT HUhOsp0wvK/wGbjuUFmSB3dLNzN6uVd+DEzFaVyNs2HIqrEvTXIB6GJzREYyDCKndqfv LIRh3IZreW1+aeAw6n2GDD9swPsjVkshQEHLuk6Ofb+IlGd6BQMOGaTovnOBrfO04Jra Zw8O9wca8CYaO8Z7PGOk4ZjUom0bW7lOQHRmotjZHEuzQrkTdSpOQSfDJhdj9a+iQeXr I4K9njrEgumRxAitTxT3xK6NEvgpAviYxKvx52ZFf8yJY/7bAIKl/r5zJ+uoGD6eTNLD 3BKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=+vzFqYPmpC2kGojLrVQnc69Ao3VqCvYBpWjAwUn+A8o=; b=WVH/iSX+3pcJF78m2Vmt8ykTUHiSsr1E3o11Jnjn+v6pxXDMFr6k5ex81TpxyLHIB3 hwwfy/f5IshSzq+0308a8XEA3Rxt5YsVpuoAYh2ozCvuvF4MFEyWAwwa7Zk1iVv/IkI0 6OdrcjshMNGBkV2+zqc2Gx8f9sVgHgahhP9dCF+tpIfIStd8AXL3CWdgOlI+TSGjX8yV umsmcLoBHrX3UrvkXxrojWjcGum2cSlBfoWmkpudZcPwijrPUM91IsXqRWmRhRqilrHZ Ow0zVhzpVmLXy5JxY1QF0cXxMc75g+JliaGScLM7CIvqSDjVslRLVduUgZzzoLbxE5eK RW/A== X-Gm-Message-State: AOPr4FUHAbhGR+r3GAEYZ7f8DwCmRpYKGUUQA7hs1W2zPrgboIrsRF1yH2lqigltZiAe63eyLEnZaJbfvyQ0NA== MIME-Version: 1.0 X-Received: by 10.107.186.6 with SMTP id k6mr9846653iof.136.1461757426253; Wed, 27 Apr 2016 04:43:46 -0700 (PDT) Received: by 10.107.9.197 with HTTP; Wed, 27 Apr 2016 04:43:46 -0700 (PDT) In-Reply-To: References: Date: Wed, 27 Apr 2016 07:43:46 -0400 Message-ID: Subject: Re: Database performance From: Karl Wright To: "user@manifoldcf.apache.org" Content-Type: multipart/alternative; boundary=94eb2c076a6223c6a6053175ecfa --94eb2c076a6223c6a6053175ecfa Content-Type: text/plain; charset=UTF-8 Hi Konstantin, The query you are looking at is performed by the UI only, and there is a parameter you can set which applies a limit to the number of documents so that the count is reported as "+" in the UI. This is the parameter: org.apache.manifoldcf.ui.maxstatuscount As for why the database gets slow for crawling, unless you are seeing reports in the log of long-running queries, then it's a good chance you need to vacuum your database instance. I generally recommend that a vacuum full be done periodically for database instances. Autovacuuming has gotten a lot better in postgres than it used to be but at least in the past the autovacuuming process would get far behind ManifoldCF and so the database would get quite bloated anyway. So I'd give that a try. If you are seeing logging output mentioning slow queries, you may need to tune how often MCF analyzes certain tables. There are parameters that control that as well. In general, if there is a slow query with a bad plan, and analyzing the tables involved makes it come up with a much better plan, analysis is not happening often enough. But first, before you get to that point, have a look at the log and see whether this is likely to be the problem. (Usually it is the stuffer query that gets slow when there's an issue with table analysis, FWIW). Please feel free to post the plan of the queries being reported here. Thanks, Karl On Wed, Apr 27, 2016 at 7:33 AM, jetnet wrote: > Hi Karl, > > I set up two MCF instances (quick setup) on the same machine, using > the same Postgres 9.3 instance (with different databases > "org.apache.manifoldcf.database.name" of course). > After a couple of days I've got a performance issue: one MCF instance > has become very slow - it processes a few docs per hour only. I guess, > the bottleneck is the database: > > "normal" instance: > SELECT status, count(*) AS count FROM jobqueue GROUP BY status -- > 738.311 rows in the table, took 1,2 sec > "G";50674 > "F";68 > "P";149179 > "C";402367 > "A";33 > "Z";136676 > > "slow" instance (currently with a single active job): > SELECT status, count(*) AS count FROM jobqueue GROUP BY status -- > 2.745.329 rows in the table, took 350 sec > "G";337922 --STATUS_PENDINGPURGATORY > "F";449 --STATUS_ACTIVEPURGATORY > "P";25909 --STATUS_PENDING > "C";562772 --STATUS_COMPLETE > "A";9 --STATUS_ACTIVE > "Z";1644927 --STATUS_PURGATORY > > Since "count(*)" is terrible slow in Postgres, I used the following > sql to count jobqueue's rows: > SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE > relname = 'jobqueue'; > > Both MCF instances have the same number of working threads, database > handles etc. > Is the database "full"? What could you recommend to improve the > performance? > > Thank you! > Konstantin > --94eb2c076a6223c6a6053175ecfa Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Konstantin,

The query you are lookin= g at is performed by the UI only, and there is a parameter you can set whic= h applies a limit to the number of documents so that the count is reported = as "<limit>+" in the UI.=C2=A0 This is the parameter:
=

org.apache.manifoldcf.ui.maxstatuscount=

As for why the database gets slow for crawling, unless you are seeing repo= rts in the log of long-running queries, then it's a good chance you nee= d to vacuum your database instance.=C2=A0 I generally recommend that a vacu= um full be done periodically for database instances.=C2=A0 Autovacuuming ha= s gotten a lot better in postgres than it used to be but at least in the pa= st the autovacuuming process would get far behind ManifoldCF and so the dat= abase would get quite bloated anyway.=C2=A0 So I'd give that a try.

If you = are seeing logging output mentioning slow queries, you may need to tune how= often MCF analyzes certain tables.=C2=A0 There are parameters that control= that as well.=C2=A0 In general, if there is a slow query with a bad plan, = and analyzing the tables involved makes it come up with a much better plan,= analysis is not happening often enough.=C2=A0 But first, before you get to= that point, have a look at the log and see whether this is likely to be th= e problem. =C2=A0(Usually it is the stuffer query that gets slow when there= 's an issue with table analysis, FWIW).=C2=A0 Please feel free to post = the plan of the queries being reported here.

Thanks,
Karl


On Wed, Apr 27, 2016 at 7:33 AM, jetnet = <jetnet@gmail.com<= /a>> wrote:
Hi Karl,

I set up two MCF instances (quick setup) on the same machine, using
the same Postgres 9.3 instance (with different databases
"
org.apache.manifoldcf.database.name" of cou= rse).
After a couple of days I've got a performance issue: one MCF instance has become very slow - it processes a few docs per hour only. I guess,
the bottleneck is the database:

"normal" instance:
SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
738.311 rows in the table, took 1,2 sec
"G";50674
"F";68
"P";149179
"C";402367
"A";33
"Z";136676

"slow" instance (currently with a single active job):
SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
2.745.329 rows in the table, took 350 sec
"G";337922=C2=A0 --STATUS_PENDINGPURGATORY
"F";449=C2=A0 =C2=A0 =C2=A0--STATUS_ACTIVEPURGATORY
"P";25909=C2=A0 =C2=A0--STATUS_PENDING
"C";562772=C2=A0 --STATUS_COMPLETE
"A";9=C2=A0 =C2=A0 =C2=A0 =C2=A0--STATUS_ACTIVE
"Z";1644927 --STATUS_PURGATORY

Since "count(*)" is terrible slow in Postgres, I used the followi= ng
sql to count jobqueue's rows:
SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
relname =3D 'jobqueue';

Both MCF instances have the same number of working threads, database
handles etc.
Is the database "full"? What could you recommend to improve the p= erformance?

Thank you!
Konstantin

--94eb2c076a6223c6a6053175ecfa--