Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@manifoldcf.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAB-cy19hKba5sMJm13tsyt-BFTs8K_9WrAR8VCA9yHM9BRYsZA@mail.gmail.com>
References: 
 <CAB-cy19hKba5sMJm13tsyt-BFTs8K_9WrAR8VCA9yHM9BRYsZA@mail.gmail.com>
Date: Wed, 27 Apr 2016 07:43:46 -0400
Message-ID: 
 <CALUFAGAooV_U_2rB5-K9srNZm_bYwGDW5HXbHEmT3U+hJ32mCw@mail.gmail.com>
Subject: Re: Database performance
From: Karl Wright <daddywri@gmail.com>
To: "user@manifoldcf.apache.org" <user@manifoldcf.apache.org>
Content-Type: multipart/alternative; boundary=94eb2c076a6223c6a6053175ecfa

--94eb2c076a6223c6a6053175ecfa
Content-Type: text/plain; charset=UTF-8

Hi Konstantin,

The query you are looking at is performed by the UI only, and there is a
parameter you can set which applies a limit to the number of documents so
that the count is reported as "<limit>+" in the UI.  This is the parameter:

org.apache.manifoldcf.ui.maxstatuscount

As for why the database gets slow for crawling, unless you are seeing
reports in the log of long-running queries, then it's a good chance you
need to vacuum your database instance.  I generally recommend that a vacuum
full be done periodically for database instances.  Autovacuuming has gotten
a lot better in postgres than it used to be but at least in the past the
autovacuuming process would get far behind ManifoldCF and so the database
would get quite bloated anyway.  So I'd give that a try.

If you are seeing logging output mentioning slow queries, you may need to
tune how often MCF analyzes certain tables.  There are parameters that
control that as well.  In general, if there is a slow query with a bad
plan, and analyzing the tables involved makes it come up with a much better
plan, analysis is not happening often enough.  But first, before you get to
that point, have a look at the log and see whether this is likely to be the
problem.  (Usually it is the stuffer query that gets slow when there's an
issue with table analysis, FWIW).  Please feel free to post the plan of the
queries being reported here.

Thanks,
Karl


On Wed, Apr 27, 2016 at 7:33 AM, jetnet <jetnet@gmail.com> wrote:

> Hi Karl,
>
> I set up two MCF instances (quick setup) on the same machine, using
> the same Postgres 9.3 instance (with different databases
> "org.apache.manifoldcf.database.name" of course).
> After a couple of days I've got a performance issue: one MCF instance
> has become very slow - it processes a few docs per hour only. I guess,
> the bottleneck is the database:
>
> "normal" instance:
> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> 738.311 rows in the table, took 1,2 sec
> "G";50674
> "F";68
> "P";149179
> "C";402367
> "A";33
> "Z";136676
>
> "slow" instance (currently with a single active job):
> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> 2.745.329 rows in the table, took 350 sec
> "G";337922  --STATUS_PENDINGPURGATORY
> "F";449     --STATUS_ACTIVEPURGATORY
> "P";25909   --STATUS_PENDING
> "C";562772  --STATUS_COMPLETE
> "A";9       --STATUS_ACTIVE
> "Z";1644927 --STATUS_PURGATORY
>
> Since "count(*)" is terrible slow in Postgres, I used the following
> sql to count jobqueue's rows:
> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
> relname = 'jobqueue';
>
> Both MCF instances have the same number of working threads, database
> handles etc.
> Is the database "full"? What could you recommend to improve the
> performance?
>
> Thank you!
> Konstantin
>

--94eb2c076a6223c6a6053175ecfa
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Konstantin,<div><br></div><div>The query you are lookin=
g at is performed by the UI only, and there is a parameter you can set whic=
h applies a limit to the number of documents so that the count is reported =
as &quot;&lt;limit&gt;+&quot; in the UI.=C2=A0 This is the parameter:</div>=
<div><br></div><div><span style=3D"color:rgb(0,0,0);font-family:Verdana,Hel=
vetica,sans-serif;font-size:12.8px">org.apache.manifoldcf.ui.maxstatuscount=
</span><br></div><div><span style=3D"color:rgb(0,0,0);font-family:Verdana,H=
elvetica,sans-serif;font-size:12.8px"><br></span></div><div><span style=3D"=
color:rgb(0,0,0);font-family:Verdana,Helvetica,sans-serif;font-size:12.8px"=
>As for why the database gets slow for crawling, unless you are seeing repo=
rts in the log of long-running queries, then it&#39;s a good chance you nee=
d to vacuum your database instance.=C2=A0 I generally recommend that a vacu=
um full be done periodically for database instances.=C2=A0 Autovacuuming ha=
s gotten a lot better in postgres than it used to be but at least in the pa=
st the autovacuuming process would get far behind ManifoldCF and so the dat=
abase would get quite bloated anyway.=C2=A0 So I&#39;d give that a try.</sp=
an></div><div><span style=3D"color:rgb(0,0,0);font-family:Verdana,Helvetica=
,sans-serif;font-size:12.8px"><br></span></div><div><span style=3D"color:rg=
b(0,0,0);font-family:Verdana,Helvetica,sans-serif;font-size:12.8px">If you =
are seeing logging output mentioning slow queries, you may need to tune how=
 often MCF analyzes certain tables.=C2=A0 There are parameters that control=
 that as well.=C2=A0 In general, if there is a slow query with a bad plan, =
and analyzing the tables involved makes it come up with a much better plan,=
 analysis is not happening often enough.=C2=A0 But first, before you get to=
 that point, have a look at the log and see whether this is likely to be th=
e problem. =C2=A0(Usually it is the stuffer query that gets slow when there=
&#39;s an issue with table analysis, FWIW).=C2=A0 Please feel free to post =
the plan of the queries being reported here.</span></div><div><span style=
=3D"color:rgb(0,0,0);font-family:Verdana,Helvetica,sans-serif;font-size:12.=
8px"><br></span></div><div><font color=3D"#000000" face=3D"Verdana, Helveti=
ca, sans-serif"><span style=3D"font-size:12.8px">Thanks,</span></font></div=
><div><font color=3D"#000000" face=3D"Verdana, Helvetica, sans-serif"><span=
 style=3D"font-size:12.8px">Karl</span></font></div><div><font color=3D"#00=
0000" face=3D"Verdana, Helvetica, sans-serif"><span style=3D"font-size:12.8=
px"><br></span></font></div></div><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote">On Wed, Apr 27, 2016 at 7:33 AM, jetnet <span dir=3D"ltr">=
&lt;<a href=3D"mailto:jetnet@gmail.com" target=3D"_blank">jetnet@gmail.com<=
/a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:=
0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Karl,<br>
<br>
I set up two MCF instances (quick setup) on the same machine, using<br>
the same Postgres 9.3 instance (with different databases<br>
&quot;<a href=3D"http://org.apache.manifoldcf.database.name" rel=3D"norefer=
rer" target=3D"_blank">org.apache.manifoldcf.database.name</a>&quot; of cou=
rse).<br>
After a couple of days I&#39;ve got a performance issue: one MCF instance<b=
r>
has become very slow - it processes a few docs per hour only. I guess,<br>
the bottleneck is the database:<br>
<br>
&quot;normal&quot; instance:<br>
SELECT status, count(*) AS count FROM jobqueue GROUP BY status --<br>
738.311 rows in the table, took 1,2 sec<br>
&quot;G&quot;;50674<br>
&quot;F&quot;;68<br>
&quot;P&quot;;149179<br>
&quot;C&quot;;402367<br>
&quot;A&quot;;33<br>
&quot;Z&quot;;136676<br>
<br>
&quot;slow&quot; instance (currently with a single active job):<br>
SELECT status, count(*) AS count FROM jobqueue GROUP BY status --<br>
2.745.329 rows in the table, took 350 sec<br>
&quot;G&quot;;337922=C2=A0 --STATUS_PENDINGPURGATORY<br>
&quot;F&quot;;449=C2=A0 =C2=A0 =C2=A0--STATUS_ACTIVEPURGATORY<br>
&quot;P&quot;;25909=C2=A0 =C2=A0--STATUS_PENDING<br>
&quot;C&quot;;562772=C2=A0 --STATUS_COMPLETE<br>
&quot;A&quot;;9=C2=A0 =C2=A0 =C2=A0 =C2=A0--STATUS_ACTIVE<br>
&quot;Z&quot;;1644927 --STATUS_PURGATORY<br>
<br>
Since &quot;count(*)&quot; is terrible slow in Postgres, I used the followi=
ng<br>
sql to count jobqueue&#39;s rows:<br>
SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE<br>
relname =3D &#39;jobqueue&#39;;<br>
<br>
Both MCF instances have the same number of working threads, database<br>
handles etc.<br>
Is the database &quot;full&quot;? What could you recommend to improve the p=
erformance?<br>
<br>
Thank you!<br>
Konstantin<br>
</blockquote></div><br></div>

--94eb2c076a6223c6a6053175ecfa--