db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: Filtering using the sort order on two columns
Date Tue, 04 Aug 2009 15:24:47 GMT
Hi Daniel,

Here are two issues which might drive the optimizer to select a table scan:

1) The optimizer may see the OR as an unoptimizable condition. I think 
Bryan referred to this.

2) The index does not cover the query, that is, it does not contain all 
columns needed to evaluate the query.

A variation of your concatenation solution could involve a generated 
column and a covering index involving that column. This would at least 
let you preserve the more readable structure of your table for other 
queries. Something like the following might work:

connect 'jdbc:derby:memory:temp;create=true';

CREATE TABLE Terms (
       id        INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL,
       field     VARCHAR(255) NOT NULL,
       text      VARCHAR(255) NOT NULL,
       docfreq   INTEGER NOT NULL,
       fieldtext generated always as ( field || ':' || text )
);
CREATE INDEX TermsFieldAndText ON Terms (fieldtext, docfreq);

-- just for show. this does not create enough data to justify
-- using an index
insert into Terms ( field, text, docfreq )
values ( 'body', 'hello', 3 ), ( 'tag', 'home', 2 ), ( 'tag', 'work', 4 
), ( 'zzz', 'abc', 1 );

select substr( fieldtext, 1, locate( ':', fieldtext ) - 1 ), substr( 
fieldtext, locate( ':', fieldtext ) + 1 ), docfreq
from Terms
where fieldtext > 'tag:home'
fetch first row only;

Hope this helps,
-Rick

Daniel Noll wrote:
> Hi all.
>
> I have a table and index like this:
>
>    CREATE TABLE Terms (
>        id        INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL,
>        field     VARCHAR(255) NOT NULL,
>        text      VARCHAR(255) NOT NULL,
>        docfreq   INTEGER NOT NULL
>    )
>
>    CREATE INDEX TermsFieldAndText ON Terms (field, text)
>
> Here's some example data.  Table is shown in sorted order as it would
> be listed in the index above:
>
> | field     | text      | docfreq      |
> ----------------------------------------
> | body      | hello     | 10           |
> | tag       | home      | 100          |
> | tag       | work      | 80           |
>
> For a given input term (let's say tag:home) I want to find the _next_
> entry in the table (so it should return tag:work).  My naive
> implementation does this:
>
>    SELECT field, text, docfreq
>    FROM Terms
>    WHERE (field = ? AND text > ?) OR field > ?
>    ORDER BY field ASC, text ASC
>
> This does the right thing as far as the output is concerned, but Derby
> does a table scan.  Looking at the query, my impression would be that
> in the worse case scenario, Derby should do two index scans.  Or, if
> it's smart, one index scan.
>
> I'm aware that I might be able to work around this by having instead a
> field with the concatenation of both fields with
> CONCAT(field,'\0',text).  But I wanted to check first whether there is
> a way I can maintain my relatively understandable table structure
> before resorting to hacks.  Basically I just need the data in index
> order, starting from a given point... so I figure there is a simple
> solution.
>
> Daniel
>
>
> --
> Daniel Noll                            Forensic and eDiscovery Software
> Senior Developer                              The world's most advanced
> Nuix                                                email data analysis
> http://nuix.com/                                and eDiscovery software
>   


Mime
View raw message