manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Performance Problems Continue
Date Tue, 16 Nov 2010 15:37:33 GMT
psql usually displays the indexes for a table when you dt the table.
These are not appearing in your postgresql version.  But it's clear
they are there from the plan.

The jobqueue table is being sequentially scanned *despite* the
probable fact that there's a reasonable, usable index on it that it
should be using.  In short, this looks like a PostgreSQL bug to me.

I'm away from my test machine right now, but I'd love to see the
indexes on the jobqueue table that you have.  There may be a new
command in psql that my version of PostgreSQL doesn't have that prints
the indexes for a table, if they took it out of dt.

If the proper index is there, then all we can do is try various forms
of the query to see if we can dodge the PostgreSQL bug.

Karl



On Tue, Nov 16, 2010 at 9:31 AM, Farzad Valad <hokie@farzad.net> wrote:
> Here is the output:
>
>
> C:\Program Files (x86)\PostgreSQL\8.3\bin>psql -U manifoldcf dbname
> Password for user manifoldcf:
> Welcome to psql 8.3.12, the PostgreSQL interactive terminal.
>
> Type:  \copyright for distribution terms
>       \h for help with SQL commands
>       \? for help with psql commands
>       \g or terminate with semicolon to execute query
>       \q to quit
>
> Warning: Console code page (437) differs from Windows code page (1252)
>         8-bit characters might not work correctly. See psql reference
>         page "Notes for Windows users" for details.
>
> dbname=> \dt jobqueue;
>           List of relations
>  Schema |   Name   | Type  |   Owner
> --------+----------+-------+------------
>  public | jobqueue | table | manifoldcf
> (1 row)
>
> dbname=> \dt carrydown;
>            List of relations
>  Schema |   Name    | Type  |   Owner
> --------+-----------+-------+------------
>  public | carrydown | table | manifoldcf
> (1 row)
>
> dbname=> select count(*) from carrydown;
>  count
> -------
>     0
> (1 row)
>
> dbname=> select distinct jobid from carrydown;
>  jobid
> -------
> (0 rows)
>
> dbname=> explain SELECT t0.id,t0.dochash,t0.docid FROM jobqueue t0 WHERE
> EXISTS(
> SELECT 'x' FROM carrydown t1 WHERE parentidhash IN
> dbname(> ('B0C2E57717919821A7C32D9F92F9F297155B727F') AND
> t1.childidhash=t0.doch
> ash AND t0.jobid=1289847478234 AND
> dbname(> t1.jobid=1289847478234 AND t1.isnew='B');
>
>     QUERY PLAN
>
> ----------------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------------
> ----
> --------------------
>  Seq Scan on jobqueue t0  (cost=0.00..2000133.07 rows=118994 width=158)
>   Filter: (subplan)
>   SubPlan
>     ->  Result  (cost=0.00..8.28 rows=1 width=0)
>           One-Time Filter: ($1 = 1289847478234::bigint)
>           ->  Index Scan using i1289847375544 on carrydown t1
> (cost=0.00..8.28
>  rows=1 width=0)
>                 Index Cond: ((jobid = 1289847478234::bigint) AND
> ((parentidhash
> )::text = 'B0C2E57717919821A7C32D9F92F9F297155B727F'::text) AND
> ((childidhash)::
> text = ($0)::text))
>                 Filter: (isnew = 'B'::bpchar)
> (8 rows)
>
> dbname=> analyze carrydown;
> ANALYZE
> dbname=> analyze jobqueue;
> ANALYZE
> dbname=> explain SELECT t0.id,t0.dochash,t0.docid FROM jobqueue t0 WHERE
> dbname-> EXISTS(SELECT 'x' FROM carrydown t1 WHERE parentidhash IN
> dbname(> ('B0C2E57717919821A7C32D9F92F9F297155B727F') AND
> dbname(> t1.childidhash=t0.dochash AND t0.jobid=1289847478234 AND
> dbname(> t1.jobid=1289847478234 AND t1.isnew='B');
>
>     QUERY PLAN
>
> ----------------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------------
> ----
> --------------------
>  Seq Scan on jobqueue t0  (cost=0.00..2091946.86 rows=124532 width=158)
>   Filter: (subplan)
>   SubPlan
>     ->  Result  (cost=0.00..8.28 rows=1 width=0)
>           One-Time Filter: ($1 = 1289847478234::bigint)
>           ->  Index Scan using i1289847375544 on carrydown t1
> (cost=0.00..8.28
>  rows=1 width=0)
>                 Index Cond: ((jobid = 1289847478234::bigint) AND
> ((parentidhash
> )::text = 'B0C2E57717919821A7C32D9F92F9F297155B727F'::text) AND
> ((childidhash)::
> text = ($0)::text))
>                 Filter: (isnew = 'B'::bpchar)
> (8 rows)
>
> dbname=>
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: Tuesday, November 16, 2010 4:21 AM
> To: connectors-user@incubator.apache.org
> Subject: Re: Performance Problems Continue
>
> As I suspected, the plan it is generating is crap:
>
> Found a query that took more than a minute (62494 ms): [SELECT
> t0.id,t0.dochash,t0.docid FROM jobqueue t0 WHERE EXISTS(SELECT 'x'
> FROM carrydown t1 WHERE parentidhash IN (?) AND
> t1.childidhash=t0.dochash AND t0.jobid=? AND t1.jobid=? AND
> t1.isnew=?)]
>  Parameter 0: 'B0C2E57717919821A7C32D9F92F9F297155B727F'
>  Parameter 1: '1289847478234'
>  Parameter 2: '1289847478234'
>  Parameter 3: 'B'
>  Plan: Seq Scan on jobqueue t0  (cost=0.00..1332448.02 rows=79465 width=152)
>  Plan:   Filter: (subplan)
>  Plan:   SubPlan
>  Plan:     ->  Result  (cost=0.00..8.28 rows=1 width=0)
>  Plan:           One-Time Filter: ($1 = 1289847478234::bigint)
>  Plan:           ->  Index Scan using i1289847375544 on carrydown t1
> (cost=0.00..8.28 rows=1 width=0)
>  Plan:                 Index Cond: ((jobid = 1289847478234::bigint)
> AND ((parentidhash)::text =
> 'B0C2E57717919821A7C32D9F92F9F297155B727F'::text) AND
> ((childidhash)::text = ($0)::text))
>  Plan:                 Filter: (isnew = 'B'::bpchar)
>
>
>
> ... which means that it is doing a sequential scan on the jobqueue
> table (large) and an index scan for each row on the carrydown table
> (which should be empty), rather than the other way round.  There is a
> carrydown index that is should be using instead, which it is
> completely ignoring, which is all the more weird.
>
> Can you do the following:
>
> (1) Under the postgresql bin directory, run psql -U manifoldcf dbname
> (2) If the defaults are in place, password should be "local_pg_passwd"
> (3) Run the following commands, and cut/paste the answers for me:
>
> \dt jobqueue;
> \dt carrydown;
> select count(*) from carrydown;
> select distinct jobid from carrydown;
> explain SELECT t0.id,t0.dochash,t0.docid FROM jobqueue t0 WHERE
> EXISTS(SELECT 'x' FROM carrydown t1 WHERE parentidhash IN
> ('B0C2E57717919821A7C32D9F92F9F297155B727F') AND
> t1.childidhash=t0.dochash AND t0.jobid=1289847478234 AND
> t1.jobid=1289847478234 AND t1.isnew='B');
>
> Once you have done that, please do the following:
>
> analyze carrydown;
> analyze jobqueue;
>
> Then, try the explain again. and send me that output too:
>
> explain SELECT t0.id,t0.dochash,t0.docid FROM jobqueue t0 WHERE
> EXISTS(SELECT 'x' FROM carrydown t1 WHERE parentidhash IN
> ('B0C2E57717919821A7C32D9F92F9F297155B727F') AND
> t1.childidhash=t0.dochash AND t0.jobid=1289847478234 AND
> t1.jobid=1289847478234 AND t1.isnew='B');
>
>
> I've tried the same commands on my system, but they show reasonable
> plans.  PostgreSQL, like all databases, develops its plans based on
> statistics it keeps for the tables in question.  The analyze command
> tells it to update those statistics, which in theory should make
> PostgreSQL immediately thereafter use a correct plan.  However, there
> may be great sensitivity to incorrect statistics for some sorts of
> query, depending on the conditions, so a small number of accumulated
> changes can cause it to start to make huge errors.  I am trying to
> determine if that's what is happening, and what those conditions are
> in your case.
>
> Thanks,
> Karl
>
>
> On Mon, Nov 15, 2010 at 9:44 PM, Farzad Valad <hokie@farzad.net> wrote:
>> 1. Using 8.3.12, gave up on 9.x
>> 2. Yes, I do.  Attached the log file for this crawl
>> 3. Using Null Output and File System input, not using RSS
>>
>
>

Mime
View raw message