manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Schuch <markus.sch...@deutschebahn.com>
Subject AW: Amazon RDS for PostgreSQL Support
Date Fri, 15 Dec 2017 19:31:46 GMT
We were no able to resolve the performance issue with the carrydown table.

We switched to RDS Aurora MySQL, which performs similar to the MariaDB database in our on
premise datacenter. No problems with the carrydown table queries of the sharepoint connector.

Regards
Markus

Von: Karl Wright [mailto:daddywri@gmail.com]
Gesendet: Donnerstag, 30. November 2017 12:23
An: user@manifoldcf.apache.org
Betreff: Re: Amazon RDS for PostgreSQL Support

Typically Jobs table is short and sequential scans are faster than index joins.  Postgres
optimizes for that.

No, the plans look fine.  Another reason for the long-running queries might well be contention
and locking -- many threads will be trying to do similar things at the same time.  You will
note that multiple records get updated in one query; this is usually helpful but when each
update is expensive you could wind up with locking causing delays.

Karl


On Thu, Nov 30, 2017 at 2:33 AM, Markus Schuch <markus.schuch@deutschebahn.com<mailto:markus.schuch@deutschebahn.com>>
wrote:
Hi Karl,

we disabled autovacuum
The stats table show there was no autovac since then.

The long running queries still occur.
There are no other apps using the database and no other jobs running.

But there is another long running query to the jobs table between the carrydown queries.
This query seem to happen at the same time with the long running carrydown query.

The plan output says “Sec Scan on jobs…”.

2017-11-30 08:16:16,008 WARN  [Finisher thread] org.apache.manifoldcf.db: Found a long-running
query (169057 ms): [SELECT id FROM jobs WHERE  (status=? OR status=? OR status=?) FOR UPDATE]
2017-11-30 08:16:16,009 WARN  [Finisher thread] org.apache.manifoldcf.db:   Parameter 0: 'A'
2017-11-30 08:16:16,009 WARN  [Finisher thread] org.apache.manifoldcf.db:   Parameter 1: 'W'
2017-11-30 08:16:16,009 WARN  [Finisher thread] org.apache.manifoldcf.db:   Parameter 2: 'R'
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan: LockRows
 (cost=0.00..4.43 rows=3 width=14) (actual time=0.022..0.024 rows=1 loops=1)
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan:   -> 
Seq Scan on jobs  (cost=0.00..4.40 rows=3 width=14) (actual time=0.021..0.022 rows=1 loops=1)
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan:         Filter:
((status = 'A'::bpchar) OR (status = 'W'::bpchar) OR (status = 'R'::bpchar))
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan:         Rows
Removed by Filter: 22
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan: Planning
time: 0.093 ms
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan: Execution
time: 0.041 ms
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:

Jobs table indices:
public    jobs       jobs_pkey                          CREATE UNIQUE INDEX jobs_pkey ON jobs
USING btree (id)
public    jobs       i1511464305264                               CREATE INDEX i1511464305264
ON jobs USING btree (status, id, priority)
public    jobs       i1511464305263                               CREATE INDEX i1511464305263
ON jobs USING btree (status, processid)
public    jobs       i1511464305262                               CREATE INDEX i1511464305262
ON jobs USING btree (connectionname)
public    jobs       i1511464305261                               CREATE INDEX i1511464305261
ON jobs USING btree (failtime)

Shouldn’t be i1511464305264 be used?

Many thanks in advance
Markus

Von: Karl Wright [mailto:daddywri@gmail.com<mailto:daddywri@gmail.com>]
Gesendet: Mittwoch, 29. November 2017 23:28
An: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Betreff: Re: Amazon RDS for PostgreSQL Support

The plans look good for the carrydown execution, so I have to conclude that the long-running
queries are due to other considerations -- perhaps concurrent vacuuming, perhaps other queries
pounding the database.

Sharepoint is especially hard on the carrydown table -- it gets huge.

Karl


On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch <markus.schuch@deutschebahn.com<mailto:markus.schuch@deutschebahn.com>>
wrote:
Hi,

since nobody responded, we started to experiment.

Setup:
AWS RDS Postgres 9.6.3
db.m4.xlarge (4 cores, 16 GB RAM)
provisioned iops 4000

Indexing Performance for most jobs is pretty good, except a for a bigger sharepoint crawl
(~200.000 docs).

We are seeing a lot of long running queries for the tables carrydown here.

Logfile:
https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt

Settings:
https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%2520settings
(due to this discussion we left autovac on: https://www.mail-archive.com/user@manifoldcf.apache.org/msg03512.html)

CPU Usage of the Database is between 15% and 35%

Are we hitting bad auto optimiziations of the newer postgresql version?

Many thanks in advance,
Markus


Mime
View raw message