Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 86127 invoked from network); 21 Nov 2008 18:03:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Nov 2008 18:03:05 -0000 Received: (qmail 62978 invoked by uid 500); 21 Nov 2008 18:03:11 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 62938 invoked by uid 500); 21 Nov 2008 18:03:11 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 62929 invoked by uid 99); 21 Nov 2008 18:03:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Nov 2008 10:03:11 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jason.rutherglen@gmail.com designates 64.233.170.189 as permitted sender) Received: from [64.233.170.189] (HELO rn-out-0910.google.com) (64.233.170.189) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Nov 2008 18:01:48 +0000 Received: by rn-out-0910.google.com with SMTP id j71so960632rne.4 for ; Fri, 21 Nov 2008 10:02:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=4V6/V5nvHJuI3SpRXilbPRiRBzqYp/P5SEYOnZrQ8+g=; b=oQR81JATWbTn2Bi1G1oqNZidO3c+0AtXmlVNZ9kirsel16d9/ijKbZYkDYfhQsrf14 N7Ijx+EOQhWD5/POCLDmMFaqaOyE71ZwiESHDKLjAX5TyCAUzS4MEy1pojyGyy9SzqPY Fu39hNqohIkfLJC+Sla3Dmt94dchCaprCw61k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=Hc4oR/riJuxyo3Y2+NCXLqtlTzcsx/fZKzX8kj1pEzrBXcoPvXV46OVvWWOedIb6Co OG8KSE6Ouf5dQTBcdnXZwfEI/x7dmQJzRyOmXrp/5c4PLTCtt+D9CB3aAfmfxDbW1fPe rRCYt8lnB+qp2upI7hgyj2CxiHFjJz8c37NNY= Received: by 10.150.95.15 with SMTP id s15mr1363913ybb.162.1227290552986; Fri, 21 Nov 2008 10:02:32 -0800 (PST) Received: by 10.151.72.13 with HTTP; Fri, 21 Nov 2008 10:02:32 -0800 (PST) Message-ID: <85d3c3b60811211002r67d4a021o4859c26c72409a0c@mail.gmail.com> Date: Fri, 21 Nov 2008 10:02:32 -0800 From: "Jason Rutherglen" To: java-dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-1461) Cached filter for a single term field In-Reply-To: <91646795.1227263685476.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_102889_3098463.1227290552953" References: <1103206257.1227057224301.JavaMail.jira@brutus> <91646795.1227263685476.JavaMail.jira@brutus> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_102889_3098463.1227290552953 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline The problem with StringIndex is that it uses strings which are costly for compareTo juxtaposed with numeric compare (used juxtaposed because originally had "compared with" which was redundant). It seems helpful to have generic primitive based StringIndex classes. On Fri, Nov 21, 2008 at 2:34 AM, Michael McCandless (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649639#action_12649639] > > Michael McCandless commented on LUCENE-1461: > -------------------------------------------- > > It seems like the core class here (DisjointMultiFilter) is doing the same > thing as FieldCache's StringIndex? Ie, it builds a data structure that maps > String <-> ord and docID -> ord. So maybe we can merge DisjointMultiFilter > into the FieldCache API. > > And then RangeMultiFilter is a great addition for quickly "spawning" > numerous new RangeFilters, having pulled & stored the StringIndex from the > FieldCache? So I think it should live in core org.apache.lucene.search.*? > I'd prefer a different name (RangeMultiFilter implies it can filter over > multiple ranges) but can't think of one. Or maybe we absorb it into > RangeFilter, as a different "rewrite" method like > "useFieldCache=true|false"? > > > Cached filter for a single term field > > ------------------------------------- > > > > Key: LUCENE-1461 > > URL: https://issues.apache.org/jira/browse/LUCENE-1461 > > Project: Lucene - Java > > Issue Type: New Feature > > Reporter: Tim Sturge > > Attachments: DisjointMultiFilter.java, RangeMultiFilter.java, > TermMultiFilter.java > > > > > > These classes implement inexpensive range filtering over a field > containing a single term. They do this by building an integer array of term > numbers (storing the term->number mapping in a TreeMap) and then > implementing a fast integer comparison based DocSetIdIterator. > > This code is currently being used to do age range filtering, but could > also be used to do other date filtering or in any application where there > need to be multiple filters based on the same single term field. I have an > untested implementation of single term filtering and have considered but not > yet implemented term set filtering (useful for location based searches) as > well. > > The code here is fairly rough; it works but lacks javadocs and toString() > and hashCode() methods etc. I'm posting it here to discover if there is > other interest in this feature; I don't mind fixing it up but would hate to > go to the effort if it's not going to make it into Lucene. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > ------=_Part_102889_3098463.1227290552953 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline The problem with StringIndex is that it uses strings which are costly for compareTo juxtaposed with numeric compare (used juxtaposed because originally had "compared with" which was redundant).  It seems helpful to have generic primitive based StringIndex classes. 

On Fri, Nov 21, 2008 at 2:34 AM, Michael McCandless (JIRA) <jira@apache.org> wrote:

   [ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649639#action_12649639 ]

Michael McCandless commented on LUCENE-1461:
--------------------------------------------

It seems like the core class here (DisjointMultiFilter) is doing the same thing as FieldCache's StringIndex?  Ie, it builds a data structure that maps String <-> ord and docID -> ord.  So maybe we can merge DisjointMultiFilter into the FieldCache API.

And then RangeMultiFilter is a great addition for quickly "spawning" numerous new RangeFilters, having pulled & stored the StringIndex from the FieldCache?  So I think it should live in core org.apache.lucene.search.*?  I'd prefer a different name (RangeMultiFilter implies it can filter over multiple ranges) but can't think of one.  Or maybe we absorb it into RangeFilter, as a different "rewrite" method like "useFieldCache=true|false"?

> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>         Attachments: DisjointMultiFilter.java, RangeMultiFilter.java, TermMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a single term. They do this by building an integer array of term numbers (storing the term->number mapping in a TreeMap) and then implementing a fast integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also be used to do other date filtering or in any application where there need to be multiple filters based on the same single term field. I have an untested implementation of single term filtering and have considered but not yet implemented term set filtering (useful for location based searches) as well.
> The code here is fairly rough; it works but lacks javadocs and toString() and hashCode() methods etc. I'm posting it here to discover if there is other interest in this feature; I don't mind fixing it up but would hate to go to the effort if it's not going to make it into Lucene.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


------=_Part_102889_3098463.1227290552953--