manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Amazon RDS for PostgreSQL Support
Date Thu, 30 Nov 2017 11:23:29 GMT
Typically Jobs table is short and sequential scans are faster than index
joins.  Postgres optimizes for that.

No, the plans look fine.  Another reason for the long-running queries might
well be contention and locking -- many threads will be trying to do similar
things at the same time.  You will note that multiple records get updated
in one query; this is usually helpful but when each update is expensive you
could wind up with locking causing delays.

Karl


On Thu, Nov 30, 2017 at 2:33 AM, Markus Schuch <
markus.schuch@deutschebahn.com> wrote:

> Hi Karl,
>
>
>
> we disabled autovacuum
>
> The stats table show there was no autovac since then.
>
>
>
> The long running queries still occur.
>
> There are no other apps using the database and no other jobs running.
>
>
>
> But there is another long running query to the jobs table between the
> carrydown queries.
>
> This query seem to happen at the same time with the long running carrydown
> query.
>
>
>
> The plan output says “Sec Scan on jobs…”.
>
>
>
> 2017-11-30 08:16:16,008 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Found a long-running query (169057 ms): [SELECT id FROM jobs WHERE
> (status=? OR status=? OR status=?) FOR UPDATE]
>
> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
> org.apache.manifoldcf.db:   Parameter 0: 'A'
>
> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
> org.apache.manifoldcf.db:   Parameter 1: 'W'
>
> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
> org.apache.manifoldcf.db:   Parameter 2: 'R'
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan: LockRows  (cost=0.00..4.43 rows=3 width=14) (actual time=0.022..0.024
> rows=1 loops=1)
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan:   ->  Seq Scan on jobs  (cost=0.00..4.40 rows=3 width=14) (actual
> time=0.021..0.022 rows=1 loops=1)
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan:         Filter: ((status = 'A'::bpchar) OR (status = 'W'::bpchar) OR
> (status = 'R'::bpchar))
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan:         Rows Removed by Filter: 22
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan: Planning time: 0.093 ms
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan: Execution time: 0.041 ms
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
>
>
>
> Jobs table indices:
>
> public    jobs       jobs_pkey                          CREATE UNIQUE
> INDEX jobs_pkey ON jobs USING btree (id)
>
> public    jobs       i1511464305264                               CREATE
> INDEX i1511464305264 ON jobs USING btree (status, id, priority)
>
> public    jobs       i1511464305263                               CREATE
> INDEX i1511464305263 ON jobs USING btree (status, processid)
>
> public    jobs       i1511464305262                               CREATE
> INDEX i1511464305262 ON jobs USING btree (connectionname)
>
> public    jobs       i1511464305261                               CREATE
> INDEX i1511464305261 ON jobs USING btree (failtime)
>
>
>
> Shouldn’t be i1511464305264 be used?
>
>
>
> Many thanks in advance
>
> Markus
>
>
>
> *Von:* Karl Wright [mailto:daddywri@gmail.com]
> *Gesendet:* Mittwoch, 29. November 2017 23:28
> *An:* user@manifoldcf.apache.org
> *Betreff:* Re: Amazon RDS for PostgreSQL Support
>
>
>
> The plans look good for the carrydown execution, so I have to conclude
> that the long-running queries are due to other considerations -- perhaps
> concurrent vacuuming, perhaps other queries pounding the database.
>
>
>
> Sharepoint is especially hard on the carrydown table -- it gets huge.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch <
> markus.schuch@deutschebahn.com> wrote:
>
> Hi,
>
>
>
> since nobody responded, we started to experiment.
>
>
>
> Setup:
>
> AWS RDS Postgres 9.6.3
>
> db.m4.xlarge (4 cores, 16 GB RAM)
>
> provisioned iops 4000
>
>
>
> Indexing Performance for most jobs is pretty good, except a for a bigger
> sharepoint crawl (~200.000 docs).
>
>
>
> We are seeing a lot of long running queries for the tables carrydown here.
>
>
>
> Logfile:
>
> https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3
> e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt
>
>
>
> Settings:
>
> https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff8
> 35/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%
> 2520settings
>
> (due to this discussion we left autovac on: https://www.mail-archive.com/
> user@manifoldcf.apache.org/msg03512.html)
>
>
>
> CPU Usage of the Database is between 15% and 35%
>
>
>
> Are we hitting bad auto optimiziations of the newer postgresql version?
>
>
>
> Many thanks in advance,
>
> Markus
>
>
>

Mime
View raw message