db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Queries satisfiable from indexes
Date Tue, 20 Mar 2007 23:28:22 GMT


Dan Karp wrote:
>>I agree, though, that there is something odd about the number of pages
>>visited and rows "qualified" here.  The top-most result set shows that
>>only 3 rows were actually returned (I think...correct me if that's
>>wrong), in which case it seems odd that we actually read 141,512
>>"qualified" rows from disk.  I don't know much about the data in
>>question but it seems like we should have done a limited range 
>>scan using the parameter value ("?") as a start and stop key.  If that
>>was the case then we should have scanned far fewer pages/rows.
You can use the following to get information about page size and number 
of pages in your table/indexes:
http://db.apache.org/derby/javadoc/engine/org/apache/derby/diag/SpaceTable.html

> 
> 
> I switched to using DISTINCT instead of GROUP BY and saw better-but-still-not-good performance
-- 0.4sec execution time instead of 1.2sec.  The cardinality of the TAGS column is 3, 4 if
you include NULL.  The cardinality of the MAILBOX_ID column is 1 in this case.  So I'd expect
either a scan of the entire index or some sort of (mailbox_id, tags) prefix scan of the index,
but in both cases I'd expect far fewer pages visited.
> 
> (This is with Derby 10.2.2.0.)  
> 
> 
> QUERY:
> 
>    SELECT DISTINCT(tags) FROM mboxgroup1.mail_item WHERE mailbox_id = ?;
> 
> 
> INDEX:
> 
>    CREATE INDEX ${DATABASE_NAME}.i_mail_item_tags_date
>      ON mail_item(mailbox_id, tags, date DESC);
> 
> 
> TABLE DDL:
> 
>    CREATE TABLE ${DATABASE_NAME}.mail_item (
>       mailbox_id    INTEGER NOT NULL,
>       id            INTEGER NOT NULL,
>       type          SMALLINT NOT NULL,
>       parent_id     INTEGER,
>       folder_id     INTEGER,
>       index_id      INTEGER,
>       imap_id       INTEGER,
>       date          INTEGER NOT NULL,
>       size          INTEGER NOT NULL,
>       volume_id     SMALLINT,
>       blob_digest   VARCHAR(28),
>       unread        INTEGER,
>       flags         INTEGER NOT NULL DEFAULT 0,
>       tags          BIGINT NOT NULL DEFAULT 0,
>       sender        VARCHAR(128),
>       subject       VARCHAR(1024),
>       name          VARCHAR(128),
>       metadata      CLOB,
>       mod_metadata  INTEGER NOT NULL,
>       change_date   INTEGER,
>       mod_content   INTEGER NOT NULL,
>       change_mask   INTEGER,
> 
>       PRIMARY KEY (mailbox_id, id),
>       CONSTRAINT fk_mail_item_mailbox_id FOREIGN KEY (mailbox_id)
>          REFERENCES zimbra.mailbox(id),
>       CONSTRAINT fk_mail_item_volume_id FOREIGN KEY (volume_id)
>          REFERENCES zimbra.volume(id),
>       CONSTRAINT fk_mail_item_parent_id FOREIGN KEY (mailbox_id, parent_id)
>          REFERENCES ${DATABASE_NAME}.mail_item(mailbox_id, id),
>       CONSTRAINT fk_mail_item_folder_id FOREIGN KEY (mailbox_id, folder_id)
>          REFERENCES ${DATABASE_NAME}.mail_item(mailbox_id, id)
>    );
> 
> 
> PLAN:
> 
> Number of opens = 1
> Rows input = 141619
> Rows returned = 3
> Eliminate duplicates = true
> In sorted order = true
> 	constructor time (milliseconds) = 0
> 	open time (milliseconds) = 0
> 	next time (milliseconds) = 0
> 	close time (milliseconds) = 0
> 	optimizer estimated row count:        14158.00
> 	optimizer estimated cost:         3842.48
> 
> Source result set:
> 	Project-Restrict ResultSet (3):
> 	Number of opens = 1
> 	Rows seen = 141619
> 	Rows filtered = 0
> 	restriction = false
> 	projection = true
> 		constructor time (milliseconds) = 0
> 		open time (milliseconds) = 0
> 		next time (milliseconds) = 0
> 		close time (milliseconds) = 0
> 		restriction time (milliseconds) = 0
> 		projection time (milliseconds) = 0
> 		optimizer estimated row count:        14158.00
> 		optimizer estimated cost:         3842.48
> 
> 	Source result set:
> 		Index Scan ResultSet for MAIL_ITEM using index I_MAIL_ITEM_TAGS_DATE at read committed
isolation level using instantaneous share row locking chosen by the optimizer
> 		Number of opens = 1
> 		Rows seen = 141619
> 		Rows filtered = 0
> 		Fetch Size = 16
> 			constructor time (milliseconds) = 0
> 			open time (milliseconds) = 0
> 			next time (milliseconds) = 0
> 			close time (milliseconds) = 0
> 			next time in milliseconds/row = 0
> 
> 		scan information: 
> 			Bit set of columns fetched={0, 1}
> 			Number of columns fetched=2
> 			Number of deleted rows visited=150
> 			Number of pages visited=326
> 			Number of rows qualified=141619
> 			Number of rows visited=141769
> 			Scan type=btree
> 			Tree height=2
> 			start position: 
> 	>= on first 1 column(s).
> 	Ordered null semantics on the following columns: 
> 
> 			stop position: 
> 	> on first 1 column(s).
> 	Ordered null semantics on the following columns: 
> 
> 			qualifiers:
> None
> 			optimizer estimated row count:        14158.00
> 			optimizer estimated cost:         3842.48
> 
> 


Mime
View raw message