manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CONNECTORS-1090) More PostgreSQL performance improvements
Date Fri, 31 Oct 2014 16:39:33 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192026#comment-14192026
] 

Karl Wright edited comment on CONNECTORS-1090 at 10/31/14 4:39 PM:
-------------------------------------------------------------------

I'm not seeing any sign from postgresql that it is using the correct index and scanning it
in index order for the stuffer query.

Here's what the 8.3 documentation says:

{code}
The planner will consider satisfying an ORDER BY specification either by scanning any available
index that matches the specification, or by scanning the table in physical order and doing
an explicit sort. For a query that requires scanning a large fraction of the table, the explicit
sort is likely to be faster because it requires less disk I/O due to a better-ordered access
pattern. Indexes are more useful when only a few rows need be fetched. An important special
case is ORDER BY in combination with LIMIT n: an explicit sort will have to process all the
data to identify the first n rows, but if there is an index matching the ORDER BY then the
first n rows can be retrieved directly, without scanning the remainder at all.
{code}

The intent was to combine ORDER BY with LIMIT, so it matches the case in question perfectly.
  I don't understand why this isn't reading from the index in almost all cases.

Instead, with 9.3, we're getting plans that look like this:

{code}
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan: Limit  (cost=1.13..633.06 rows=240
width=192) (actual time=0.307..4.133 rows=240 loops=1)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:   ->  Nested Loop Semi Join  (cost=1.13..74547.95
rows=28312 width=192) (actual time=0.305..4.117 rows=240 loops=1)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:         Join Filter: (t0.jobid = t1.id)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:         ->  Nested Loop Anti Join
 (cost=1.13..74111.87 rows=28312 width=192) (actual time=0.294..3.976 rows=240 loops=1)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:               ->  Nested Loop Anti
Join  (cost=0.71..39180.11 rows=28348 width=192) (actual time=0.268..0.815 rows=240 loops=1)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                     ->  Index Scan
using i1414771440111 on jobqueue t0  (cost=0.42..20351.99 rows=28384 width=192) (actual time=0.236..0.399
rows=240 loops=1)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                           Index Cond:
((checkaction = 'R'::bpchar) AND (checktime <= 1414772647826::bigint))
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                           Filter:
((status = 'P'::bpchar) OR (status = 'G'::bpchar))
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                           Rows Removed
by Filter: 121
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                     ->  Nested
Loop  (cost=0.29..0.65 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=240)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                           ->  Index
Scan using i1414771440117 on prereqevents t3  (cost=0.14..0.16 rows=1 width=524) (actual time=0.001..0.001
rows=0 loops=240)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                                 Index
Cond: (t0.id = owner)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                           ->  Index
Only Scan using events_pkey on events t4  (cost=0.14..0.48 rows=1 width=516) (never executed)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                                 Index
Cond: (name = (t3.eventname)::text)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                                 Heap
Fetches: 0
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:               ->  Index Scan using
i1414771440110 on jobqueue t2  (cost=0.42..1.23 rows=1 width=49) (actual time=0.012..0.012
rows=0 loops=240)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                     Index Cond: (((dochash)::text
= (t0.dochash)::text) AND (jobid = t0.jobid))
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                     Filter: ((status
= 'A'::bpchar) OR (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status = 'f'::bpchar)
OR (status = 'D'::bpchar) OR (status = 'd'::bpchar))
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                     Rows Removed by
Filter: 1
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:         ->  Materialize  (cost=0.00..11.41
rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=240)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:               ->  Seq Scan on jobs
t1  (cost=0.00..11.40 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=1)
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan:                     Filter: ((priority
= 5::bigint) AND ((status = 'A'::bpchar) OR (status = 'a'::bpchar)))
 WARN 2014-10-31 12:24:07,944 (Stuffer thread) -  Plan: Total runtime: 4.474 ms
{code}



was (Author: kwright@metacarta.com):
I'm not seeing any sign from postgresql that it is using the correct index and scanning it
in index order for the stuffer query.

Here's what the 8.3 documentation says:

{code}
The planner will consider satisfying an ORDER BY specification either by scanning any available
index that matches the specification, or by scanning the table in physical order and doing
an explicit sort. For a query that requires scanning a large fraction of the table, the explicit
sort is likely to be faster because it requires less disk I/O due to a better-ordered access
pattern. Indexes are more useful when only a few rows need be fetched. An important special
case is ORDER BY in combination with LIMIT n: an explicit sort will have to process all the
data to identify the first n rows, but if there is an index matching the ORDER BY then the
first n rows can be retrieved directly, without scanning the remainder at all.
{code}

The intent was to combine ORDER BY with LIMIT, so it matches the case in question perfectly.
  I don't understand why this isn't reading from the index in almost all cases.


> More PostgreSQL performance improvements
> ----------------------------------------
>
>                 Key: CONNECTORS-1090
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1090
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework core
>    Affects Versions: Manifold 1.7.1
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.8, ManifoldCF 2.0
>
>         Attachments: CONNECTORS-1090.patch
>
>
> More performance improvements may be possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message