Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 18148 invoked from network); 4 Aug 2009 15:25:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Aug 2009 15:25:20 -0000 Received: (qmail 40718 invoked by uid 500); 4 Aug 2009 15:25:25 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 40663 invoked by uid 500); 4 Aug 2009 15:25:24 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 40655 invoked by uid 99); 4 Aug 2009 15:25:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Aug 2009 15:25:24 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [192.18.43.132] (HELO sca-es-mail-1.sun.com) (192.18.43.132) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Aug 2009 15:25:11 +0000 Received: from fe-sfbay-09.sun.com ([192.18.43.129]) by sca-es-mail-1.sun.com (8.13.7+Sun/8.12.9) with ESMTP id n74FOnBl010187 for ; Tue, 4 Aug 2009 08:24:50 -0700 (PDT) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII; format=flowed Received: from conversion-daemon.fe-sfbay-09.sun.com by fe-sfbay-09.sun.com (Sun Java(tm) System Messaging Server 7u2-7.02 64bit (built Apr 16 2009)) id <0KNU00800YSXCW00@fe-sfbay-09.sun.com> for derby-user@db.apache.org; Tue, 04 Aug 2009 08:24:49 -0700 (PDT) Received: from richard-hillegas-computer.local ([unknown] [129.150.240.72]) by fe-sfbay-09.sun.com (Sun Java(tm) System Messaging Server 7u2-7.02 64bit (built Apr 16 2009)) with ESMTPSA id <0KNU00FX7YTCY440@fe-sfbay-09.sun.com> for derby-user@db.apache.org; Tue, 04 Aug 2009 08:24:48 -0700 (PDT) Date: Tue, 04 Aug 2009 08:24:47 -0700 From: Rick Hillegas Subject: Re: Filtering using the sort order on two columns In-reply-to: Sender: Richard.Hillegas@Sun.COM To: Derby Discussion Message-id: <4A7852BF.1060306@sun.com> References: User-Agent: Thunderbird 2.0.0.22 (Macintosh/20090605) X-Virus-Checked: Checked by ClamAV on apache.org Hi Daniel, Here are two issues which might drive the optimizer to select a table scan: 1) The optimizer may see the OR as an unoptimizable condition. I think Bryan referred to this. 2) The index does not cover the query, that is, it does not contain all columns needed to evaluate the query. A variation of your concatenation solution could involve a generated column and a covering index involving that column. This would at least let you preserve the more readable structure of your table for other queries. Something like the following might work: connect 'jdbc:derby:memory:temp;create=true'; CREATE TABLE Terms ( id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL, field VARCHAR(255) NOT NULL, text VARCHAR(255) NOT NULL, docfreq INTEGER NOT NULL, fieldtext generated always as ( field || ':' || text ) ); CREATE INDEX TermsFieldAndText ON Terms (fieldtext, docfreq); -- just for show. this does not create enough data to justify -- using an index insert into Terms ( field, text, docfreq ) values ( 'body', 'hello', 3 ), ( 'tag', 'home', 2 ), ( 'tag', 'work', 4 ), ( 'zzz', 'abc', 1 ); select substr( fieldtext, 1, locate( ':', fieldtext ) - 1 ), substr( fieldtext, locate( ':', fieldtext ) + 1 ), docfreq from Terms where fieldtext > 'tag:home' fetch first row only; Hope this helps, -Rick Daniel Noll wrote: > Hi all. > > I have a table and index like this: > > CREATE TABLE Terms ( > id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL, > field VARCHAR(255) NOT NULL, > text VARCHAR(255) NOT NULL, > docfreq INTEGER NOT NULL > ) > > CREATE INDEX TermsFieldAndText ON Terms (field, text) > > Here's some example data. Table is shown in sorted order as it would > be listed in the index above: > > | field | text | docfreq | > ---------------------------------------- > | body | hello | 10 | > | tag | home | 100 | > | tag | work | 80 | > > For a given input term (let's say tag:home) I want to find the _next_ > entry in the table (so it should return tag:work). My naive > implementation does this: > > SELECT field, text, docfreq > FROM Terms > WHERE (field = ? AND text > ?) OR field > ? > ORDER BY field ASC, text ASC > > This does the right thing as far as the output is concerned, but Derby > does a table scan. Looking at the query, my impression would be that > in the worse case scenario, Derby should do two index scans. Or, if > it's smart, one index scan. > > I'm aware that I might be able to work around this by having instead a > field with the concatenation of both fields with > CONCAT(field,'\0',text). But I wanted to check first whether there is > a way I can maintain my relatively understandable table structure > before resorting to hacks. Basically I just need the data in index > order, starting from a given point... so I figure there is a simple > solution. > > Daniel > > > -- > Daniel Noll Forensic and eDiscovery Software > Senior Developer The world's most advanced > Nuix email data analysis > http://nuix.com/ and eDiscovery software >