db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@gmail.com>
Subject Re: Queries satisfiable from indexes
Date Tue, 20 Mar 2007 21:22:53 GMT
Dan Karp wrote:

>>In the case of the query plan you presented, it seems
>>to me that Derby is in fact using the index, and not the
>>base table; that's what I believe this line is saying:
>>
>>>Index Scan ResultSet for MAIL_ITEM using index I_MAIL_ITEM_TAGS_DATE
> 
> I believe that's included for every query satisfied using an index.  
> But the number of pages accessed indicates that it actually loaded all
> the rows from the table:
> 
>    scan information:
> Bit set of columns fetched={0, 1}
> Number of columns fetched=2
> Number of deleted rows visited=20458
> Number of pages visited=501
> Number of rows qualified=141512
> Number of rows visited=161970

This appears to show that every "row" was loaded, yes.  But those rows could 
have come from the index itself, not necessarily from the underlying table.  So 
I think Bryan is right: the answer to your original question of "Can Derby 
satisfy a query from an index without any access to the underlying rows?" is 
Yes, it can.  And often does.

The query plan that you showed does not contradict this answer in and of itself.

I agree, though, that there is something odd about the number of pages visited 
and rows "qualified" here.  The top-most result set shows that only 3 rows were 
actually returned (I think...correct me if that's wrong), in which case it seems 
odd that we actually read 141,512 "qualified" rows from disk.  I don't know much 
about the data in question but it seems like we should have done a limited range 
scan using the parameter value ("?") as a start and stop key.  If that was the 
case then we should have scanned far fewer pages/rows.

Maybe, as you said, this has something to do with the use of the GROUP BY, which 
necessitates a sort?

 > In this case, would it help things if I'd used DISTINCT(tags) instead of a
 > GROUP BY clause?

Ummm...I don't know, you tell us ;)  Can you try this out and let us know?

What version of Derby are you using?  Is it 10.2 or is this off the development 
trunk?

I came up with a naive attempt at a simple repro for this but I wasn't 
successful.  In ij I did the following:

-- Create table and insert 80 non-unique rows.
create table t1 (i int, c char(10), d double);
insert into t1 values (1, 'one', 1.1);
insert into t1 values (2, 'two', 2.2);
insert into t1 values (3, 'three', 3.3);
insert into t1 values (4, 'four', 4.4);
insert into t1 values (5, 'five', 5.5);
insert into t1 select * from t1;
insert into t1 select i*i, cast (i*i as char(10)), d * d from t1;
insert into t1 select i*i, cast (i*i as char(10)), d * d from t1;
insert into t1 select i*i, cast (i*i as char(10)), d * d from t1;

create index ix on t1 (d, i, c);

prepare p1 as 'select i from t1 where d = ? group by d, i';
execute p1 using 'values 2.2';

The resultant plan showed that number of "qualified" rows is far fewer than the 
number of rows in the table, which is what I would expect:

	Source result set:
	Index Scan ResultSet for T1 using index IX at read committed isolation level 
using instantaneous share row locking chosen by the optimizer
	Number of opens = 1
	Rows seen = 2
	Rows filtered = 0
	Fetch Size = 16
		constructor time (milliseconds) = 0
		open time (milliseconds) = 0
		next time (milliseconds) = 0
		close time (milliseconds) = 0
		next time in milliseconds/row = 0

	scan information:
		Bit set of columns fetched={0, 1}
		Number of columns fetched=2
		Number of deleted rows visited=0
		Number of pages visited=1
		Number of rows qualified=2
		Number of rows visited=3

So I'm not sure what's going on in your particular case.  If you have DDL and/or 
a database that you could provide with the query, that might help with the 
investigation of this particular issue.  Offhand, I'm not sure what's going on 
here, so the more info you can provide, the better...

Army


Mime
View raw message