db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kane, David" <David_K...@sra.com>
Subject RE: Performance Tuning Problem
Date Tue, 10 Apr 2007 16:38:13 GMT
Folks,

Dropping and recreating the indexes seemed to do the trick.  Even the
more complex example is running in hundredths of second once the query
is compiled.  

That suggests that DERBY-269 was indeed the culprit.  Thanks for calling
that to my attention.

Dave

-----Original Message-----
From: Kane, David [mailto:David_Kane@sra.com] 
Sent: Monday, April 09, 2007 6:27 PM
To: Derby Discussion
Subject: RE: Performance Tuning Problem

Folks,

I think I have answers to all of Army's questions here.  

* The request DDL is listed below.

* I included the output in the original posting.  The first query has 14
matching records.  The second query has 2.

* The indexed were created as part of copying the database using
(http://dbcopyplugin.sourceforge.net/).  I assume the data was copied
first, but I do not know for sure.  Would it make sense to try:

	alter table <table-name> compress [sequential]  

That is mentioned in DERBY-269, or perhaps just drop and recreate the
indexes?

* I saw DERBY-47 as I investigated the problem. I fiddled with the query
design a bit to avoid the IN operators, and I built a version of derby
off the SVN repository to see if I could pick up the fix for that, but
neither of those made things better.

* The zip archive is about 1gb.  The data is public so I don't mind
posting it if that would be helpful.  Is that too big for Jira?  

Sincerely,
David Kane

-- Timestamp: 2007-04-09 18:10:27.739
-- Source database is: C:\gominer10.2
-- Connection URL is: jdbc:derby:C:\gominer10.2
-- appendLogs: false

-- ----------------------------------------------
-- DDL Statements for tables
-- ----------------------------------------------

CREATE TABLE "APP"."DBXREF" ("ID" INTEGER NOT NULL, "XREF_KEY"
VARCHAR(255) NOT NULL, "XREF_KEYTYPE" VARCHAR(32), "XREF_DBNAME"
VARCHAR(55) NOT NULL, "XREF_DESC" VARCHAR(255));

CREATE TABLE "APP"."DBCOPYTEST" ("SPECIES" CHAR(10));

CREATE TABLE "APP"."SEQ_PROPERTY" ("ID" INTEGER NOT NULL, "SEQ_ID"
INTEGER NOT NULL, "PROPERTY_KEY" VARCHAR(64) NOT NULL, "PROPERTY_VAL"
VARCHAR(255) NOT NULL);

CREATE TABLE "APP"."GENE_PRODUCT" ("ID" INTEGER NOT NULL, "SYMBOL"
VARCHAR(128) NOT NULL, "DBXREF_ID" INTEGER NOT NULL, "SPECIES_ID"
INTEGER, "FULL_NAME" CLOB(65535), "OFFICIALNAME" VARCHAR(50));

CREATE TABLE "APP"."TERM_SYNONYM" ("TERM_ID" INTEGER NOT NULL,
"TERM_SYNONYM" VARCHAR(255), "ACC_SYNONYM" INTEGER);

CREATE TABLE "APP"."EVIDENCE" ("ID" INTEGER NOT NULL, "CODE" VARCHAR(8)
NOT NULL, "ASSOCIATION_ID" INTEGER NOT NULL, "DBXREF_ID" INTEGER NOT
NULL, "SEQ_ACC" VARCHAR(255));

CREATE TABLE "APP"."GENE_PRODUCT_SEQ" ("GENE_PRODUCT_ID" INTEGER NOT
NULL, "SEQ_ID" INTEGER NOT NULL, "IS_PRIMARY_SEQ" INTEGER);

CREATE TABLE "APP"."GENE_PRODUCT_SYNONYM" ("GENE_PRODUCT_ID" INTEGER NOT
NULL, "PRODUCT_SYNONYM" VARCHAR(255) NOT NULL);

CREATE TABLE "APP"."GRAPH_PATH" ("ID" INTEGER NOT NULL, "TERM1_ID"
INTEGER NOT NULL, "TERM2_ID" INTEGER NOT NULL, "DISTANCE" INTEGER NOT
NULL);

CREATE TABLE "APP"."SPECIES" ("ID" INTEGER NOT NULL, "NCBI_TAXA_ID"
INTEGER, "COMMON_NAME" VARCHAR(255), "LINEAGE_STRING" CLOB(65535),
"GENUS" VARCHAR(55), "SPECIES" VARCHAR(55));

CREATE TABLE "APP"."TERM_DBXREF" ("TERM_ID" INTEGER NOT NULL,
"DBXREF_ID" INTEGER NOT NULL);

CREATE TABLE "APP"."ASSOCIATION" ("ID" INTEGER NOT NULL, "TERM_ID"
INTEGER NOT NULL, "GENE_PRODUCT_ID" INTEGER NOT NULL, "IS_NOT" INTEGER,
"ROLE_GROUP" INTEGER);

CREATE TABLE "APP"."GENE_PRODUCT_PROPERTY" ("GENE_PRODUCT_ID" INTEGER
NOT NULL, "PROPERTY_KEY" VARCHAR(64) NOT NULL, "PROPERTY_VAL"
VARCHAR(255));

CREATE TABLE "APP"."GENE_PRODUCT_COUNT" ("TERM_ID" INTEGER NOT NULL,
"CODE" VARCHAR(8), "SPECIESDBNAME" VARCHAR(55) NOT NULL, "PRODUCT_COUNT"
INTEGER NOT NULL);

CREATE TABLE "APP"."TERM_DEFINITION" ("TERM_ID" INTEGER NOT NULL,
"TERM_DEFINITION" CLOB(65535) NOT NULL, "COMMENT" CLOB(16277215),
"REFERENCE" VARCHAR(255));

CREATE TABLE "APP"."INSTANCE_DATA" ("RELEASE_NAME" VARCHAR(255),
"RELEASE_TYPE" VARCHAR(255), "RELEASE_NOTES" CLOB(65535));

CREATE TABLE "APP"."SEQ_DBXREF" ("SEQ_ID" INTEGER NOT NULL, "DBXREF_ID"
INTEGER NOT NULL);

CREATE TABLE "APP"."GRAPH_PATH2TERM" ("GRAPH_PATH_ID" INTEGER NOT NULL,
"TERM_ID" INTEGER NOT NULL, "RANK" INTEGER NOT NULL);

CREATE TABLE "APP"."TERM2TERM" ("ID" INTEGER NOT NULL,
"RELATIONSHIP_TYPE_ID" INTEGER NOT NULL, "TERM1_ID" INTEGER NOT NULL,
"TERM2_ID" INTEGER NOT NULL);

CREATE TABLE "APP"."TERM" ("ID" INTEGER NOT NULL, "NAME" VARCHAR(255)
NOT NULL, "TERM_TYPE" VARCHAR(55) NOT NULL, "ACC" VARCHAR(32) NOT NULL,
"IS_OBSOLETE" INTEGER NOT NULL, "IS_ROOT" INTEGER NOT NULL);

-- ----------------------------------------------
-- DDL Statements for indexes
-- ----------------------------------------------

CREATE INDEX "APP"."SEQP1" ON "APP"."SEQ_PROPERTY" ("PROPERTY_KEY");

CREATE INDEX "APP"."SEQP2" ON "APP"."SEQ_PROPERTY" ("PROPERTY_VAL");

CREATE INDEX "APP"."SEQ_ID" ON "APP"."SEQ_PROPERTY" ("SEQ_ID",
"PROPERTY_KEY", "PROPERTY_VAL");

CREATE INDEX "APP"."SEQP0" ON "APP"."SEQ_PROPERTY" ("SEQ_ID");

CREATE INDEX "APP"."T1" ON "APP"."TERM" ("NAME");

CREATE INDEX "APP"."T2" ON "APP"."TERM" ("TERM_TYPE");

CREATE INDEX "APP"."T3" ON "APP"."TERM" ("ACC");

CREATE INDEX "APP"."GPP2" ON "APP"."GENE_PRODUCT_PROPERTY"
("PROPERTY_KEY");

CREATE INDEX "APP"."GPP3" ON "APP"."GENE_PRODUCT_PROPERTY"
("PROPERTY_VAL");

CREATE INDEX "APP"."GPPU4" ON "APP"."GENE_PRODUCT_PROPERTY"
("GENE_PRODUCT_ID", "PROPERTY_KEY", "PROPERTY_VAL");

CREATE INDEX "APP"."GPP1" ON "APP"."GENE_PRODUCT_PROPERTY"
("GENE_PRODUCT_ID");

CREATE INDEX "APP"."A3" ON "APP"."ASSOCIATION" ("TERM_ID",
"GENE_PRODUCT_ID");

CREATE INDEX "APP"."A4" ON "APP"."ASSOCIATION" ("ID", "TERM_ID",
"GENE_PRODUCT_ID");

CREATE INDEX "APP"."A1" ON "APP"."ASSOCIATION" ("TERM_ID");

CREATE INDEX "APP"."A2" ON "APP"."ASSOCIATION" ("GENE_PRODUCT_ID");

CREATE INDEX "APP"."TT1" ON "APP"."TERM2TERM" ("TERM1_ID");

CREATE INDEX "APP"."TERM1_ID" ON "APP"."TERM2TERM" ("TERM1_ID",
"TERM2_ID", "RELATIONSHIP_TYPE_ID");

CREATE INDEX "APP"."TT2" ON "APP"."TERM2TERM" ("TERM2_ID");

CREATE INDEX "APP"."TT3" ON "APP"."TERM2TERM" ("TERM1_ID", "TERM2_ID");

CREATE INDEX "APP"."TT4" ON "APP"."TERM2TERM" ("RELATIONSHIP_TYPE_ID");

CREATE INDEX "APP"."GS1" ON "APP"."GENE_PRODUCT_SYNONYM"
("GENE_PRODUCT_ID");

CREATE INDEX "APP"."GS2" ON "APP"."GENE_PRODUCT_SYNONYM"
("PRODUCT_SYNONYM");

CREATE INDEX "APP"."GENE_PRODUCT_ID" ON "APP"."GENE_PRODUCT_SYNONYM"
("GENE_PRODUCT_ID", "PRODUCT_SYNONYM");

CREATE INDEX "APP"."TX0" ON "APP"."TERM_DBXREF" ("TERM_ID");

CREATE INDEX "APP"."TX1" ON "APP"."TERM_DBXREF" ("DBXREF_ID");

CREATE INDEX "APP"."TX2" ON "APP"."TERM_DBXREF" ("TERM_ID",
"DBXREF_ID");

CREATE INDEX "APP"."EV4" ON "APP"."EVIDENCE" ("ASSOCIATION_ID", "CODE");

CREATE INDEX "APP"."ASSOCIATION_ID" ON "APP"."EVIDENCE"
("ASSOCIATION_ID", "DBXREF_ID", "CODE", "SEQ_ACC");

CREATE INDEX "APP"."EV1" ON "APP"."EVIDENCE" ("ASSOCIATION_ID");

CREATE INDEX "APP"."EV2" ON "APP"."EVIDENCE" ("CODE");

CREATE INDEX "APP"."EV3" ON "APP"."EVIDENCE" ("DBXREF_ID");

CREATE INDEX "APP"."SEQX0" ON "APP"."SEQ_DBXREF" ("SEQ_ID");

CREATE INDEX "APP"."SEQX1" ON "APP"."SEQ_DBXREF" ("DBXREF_ID");

CREATE INDEX "APP"."SEQX2" ON "APP"."SEQ_DBXREF" ("SEQ_ID",
"DBXREF_ID");

CREATE INDEX "APP"."TD1" ON "APP"."TERM_DEFINITION" ("TERM_ID");

CREATE INDEX "APP"."MYINDEXTEST1" ON "APP"."GENE_PRODUCT" ("ID",
"SYMBOL", "DBXREF_ID");

CREATE INDEX "APP"."G2" ON "APP"."GENE_PRODUCT" ("DBXREF_ID");

CREATE INDEX "APP"."G3" ON "APP"."GENE_PRODUCT" ("SPECIES_ID");

CREATE INDEX "APP"."G4" ON "APP"."GENE_PRODUCT" ("ID", "SPECIES_ID");

CREATE INDEX "APP"."G5" ON "APP"."GENE_PRODUCT" ("DBXREF_ID",
"SPECIES_ID");

CREATE INDEX "APP"."G6" ON "APP"."GENE_PRODUCT" ("ID", "DBXREF_ID");

CREATE INDEX "APP"."OFFNAMEINDEX" ON "APP"."GENE_PRODUCT"
("OFFICIALNAME");

CREATE INDEX "APP"."SYMBOL" ON "APP"."GENE_PRODUCT" ("SYMBOL",
"DBXREF_ID");

CREATE INDEX "APP"."G1" ON "APP"."GENE_PRODUCT" ("SYMBOL");

CREATE INDEX "APP"."SP2" ON "APP"."SPECIES" ("COMMON_NAME");

CREATE INDEX "APP"."SP3" ON "APP"."SPECIES" ("GENUS");

CREATE INDEX "APP"."SP4" ON "APP"."SPECIES" ("SPECIES");

CREATE INDEX "APP"."SP5" ON "APP"."SPECIES" ("GENUS", "SPECIES");

CREATE INDEX "APP"."SP6" ON "APP"."SPECIES" ("ID", "NCBI_TAXA_ID");

CREATE INDEX "APP"."SP1" ON "APP"."SPECIES" ("NCBI_TAXA_ID");

CREATE INDEX "APP"."GRAPH_PATH1" ON "APP"."GRAPH_PATH" ("TERM1_ID");

CREATE INDEX "APP"."GRAPH_PATH2" ON "APP"."GRAPH_PATH" ("TERM2_ID");

CREATE INDEX "APP"."GRAPH_PATH3" ON "APP"."GRAPH_PATH" ("TERM1_ID",
"TERM2_ID");

CREATE INDEX "APP"."GRAPH_PATH4" ON "APP"."GRAPH_PATH" ("TERM1_ID",
"DISTANCE");

CREATE INDEX "APP"."DX1" ON "APP"."DBXREF" ("XREF_DBNAME");

CREATE INDEX "APP"."DX2" ON "APP"."DBXREF" ("XREF_KEY");

CREATE INDEX "APP"."DX3" ON "APP"."DBXREF" ("ID", "XREF_DBNAME");

CREATE INDEX "APP"."DX4" ON "APP"."DBXREF" ("ID", "XREF_KEY",
"XREF_DBNAME");

CREATE INDEX "APP"."XREF_KEY" ON "APP"."DBXREF" ("XREF_KEY",
"XREF_DBNAME");

CREATE INDEX "APP"."TS1" ON "APP"."TERM_SYNONYM" ("TERM_ID");

CREATE INDEX "APP"."TS2" ON "APP"."TERM_SYNONYM" ("TERM_SYNONYM");

CREATE INDEX "APP"."TS3" ON "APP"."TERM_SYNONYM" ("TERM_ID",
"TERM_SYNONYM");

CREATE INDEX "APP"."GPSEQ1" ON "APP"."GENE_PRODUCT_SEQ"
("GENE_PRODUCT_ID");

CREATE INDEX "APP"."GPSEQ2" ON "APP"."GENE_PRODUCT_SEQ" ("SEQ_ID");

CREATE INDEX "APP"."GPSEQ3" ON "APP"."GENE_PRODUCT_SEQ" ("SEQ_ID",
"GENE_PRODUCT_ID");

CREATE INDEX "APP"."GPC1" ON "APP"."GENE_PRODUCT_COUNT" ("TERM_ID");

CREATE INDEX "APP"."GPC2" ON "APP"."GENE_PRODUCT_COUNT" ("CODE");

CREATE INDEX "APP"."GPC3" ON "APP"."GENE_PRODUCT_COUNT"
("SPECIESDBNAME");

CREATE INDEX "APP"."GPC4" ON "APP"."GENE_PRODUCT_COUNT" ("TERM_ID",
"CODE", "SPECIESDBNAME");

-- ----------------------------------------------
-- DDL Statements for keys
-- ----------------------------------------------

-- primary/unique
ALTER TABLE "APP"."SEQ_PROPERTY" ADD CONSTRAINT "SQL070329021711680"
PRIMARY KEY ("ID");

ALTER TABLE "APP"."TERM" ADD CONSTRAINT "SQL070329113915100" PRIMARY KEY
("ID");

ALTER TABLE "APP"."ASSOCIATION" ADD CONSTRAINT "SQL070329021801090"
PRIMARY KEY ("ID");

ALTER TABLE "APP"."TERM2TERM" ADD CONSTRAINT "SQL070329113935130"
PRIMARY KEY ("ID");

ALTER TABLE "APP"."EVIDENCE" ADD CONSTRAINT "SQL070329082038020" PRIMARY
KEY ("ID");

ALTER TABLE "APP"."GENE_PRODUCT" ADD CONSTRAINT "SQL070329100902940"
PRIMARY KEY ("ID");

ALTER TABLE "APP"."SPECIES" ADD CONSTRAINT "SQL070330024322060" PRIMARY
KEY ("ID");

ALTER TABLE "APP"."GRAPH_PATH" ADD CONSTRAINT "SQL070329112728690"
PRIMARY KEY ("ID");

ALTER TABLE "APP"."DBXREF" ADD CONSTRAINT "SQL070329072505180" PRIMARY
KEY ("ID");

-----Original Message-----
From: Army [mailto:qozinx@gmail.com] 
Sent: Monday, April 09, 2007 5:52 PM
To: Derby Discussion
Subject: Re: Performance Tuning Problem

Kane, David wrote:
> 
> Here are the indexes that are being reported on the ASSOCIATION table:

[ snip index display ]

Hate to be a nuisance, but can you post the DDL for the tables in the
query?  I 
find it easier to make sense of the DDL than to parse the index table
(maybe 
it's just me).  You can get that by using "dblook":

   http://db.apache.org/derby/docs/dev/tools/ctoolsdblook.html

Ex.

   java org.apache.derby.tools.dblook -d jdbc:derby:mydb -t association

Might be nice to get the DDL for all of the tables in the query, not
just 
"association".

> Even if there was an index problem here, would that explain why it
seems
> to be taking so long to loop through the ResultSet object?

Can you say how many rows the two queries that you posted return?

If you have a table with over 10 million rows in it (which you do) then
any 
index problem that causes the optimizer to choose a table scan (which it
does, 
according to the program output) could definitely cause a slowdown.
Especially 
if that 10 million row table is then joined with another table that has
over a 
million rows of its own (which is, I think, what your query is doing).
So yes, 
an index problem could definitely explain the behavior you're seeing.

Did you by chance create the indexes before you inserted the data?  If
so, there 
are known problems with doing so: see esp. DERBY-269.  Given that, it's 
generally a good idea to populate the tables first and *then* create the
indexes 
afterward.  Doing so provides Derby with better statistics, which in
turn leads 
to better cost estimates and (hopefully) better plans.

I also noticed that the "moreComplexQueryResults" query includes several
IN 
clauses.  Depending on the indexes present, that query may be suffering
from 
DERBY-47, for which some changes have already been made for the upcoming

release.  See that Jira (https://issues.apache.org/jira/browse/DERBY-47)
for 
more details.

 > I have put a copy of the derby database at ftp://<snip>

How large is this database zip file?  Is it possible to post it to a
Derby Jira 
issue for ease of community access?

Army


Mime
View raw message