Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 92318 invoked from network); 24 Oct 2008 11:26:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Oct 2008 11:26:52 -0000 Received: (qmail 48923 invoked by uid 500); 24 Oct 2008 11:26:48 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 48887 invoked by uid 500); 24 Oct 2008 11:26:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 48876 invoked by uid 99); 24 Oct 2008 11:26:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Oct 2008 04:26:48 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [134.2.129.75] (HELO penthesilea.sfs.uni-tuebingen.de) (134.2.129.75) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Oct 2008 11:25:36 +0000 Received: from [134.2.129.126] (ithaka.sfs.uni-tuebingen.de [134.2.129.126]) by penthesilea.sfs.uni-tuebingen.de (Postfix) with ESMTP id 696E6C6D8 for ; Fri, 24 Oct 2008 13:25:44 +0200 (MET DST) Message-ID: <4901B0D8.7000504@sfs.uni-tuebingen.de> Date: Fri, 24 Oct 2008 13:26:16 +0200 From: Niels Ott User-Agent: Thunderbird 2.0.0.17 (X11/20080925) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Combining keyword queries with database-style queries References: <49006DC9.4070502@sfs.uni-tuebingen.de> <359a92830810230547n629c5e0ek4daf18e971761439@mail.gmail.com> In-Reply-To: <359a92830810230547n629c5e0ek4daf18e971761439@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Erick, this RangeQuery thing looks promising. It might be a bit hacky but it will most probably do the job in the given time and framework. Thanks a lot, Niels Erick Erickson schrieb: > Well, assuming that token_count is an indexed field > in your documents (i.e. not something you're > computing on the fly), just use a RangeQuery for the numeric > part. Actually, you probably want to use > ConstantScoreRangeQuery... > > The only thing you have to watch is that Lucene does a > lexical compare, so you have to index your numbers > as comparable strings, probably left-padding to some > fixed width with zeros, see NumberTools. > > Best > Erick > > On Thu, Oct 23, 2008 at 8:27 AM, Niels Ott wrote: > >> Hi everybody, >> >> I need to query for documents not only for search terms but also for >> numeric values (or other general types). Let me try to explain with a >> hypothetical example. >> >> Assuming there is a value for the number words in each document (or the >> number of person names, or whatever), I would want to formulate a query >> like "Give me documents containing 'jack johnson' AND with token_count > >> 250". >> >> I've been working with Lucene before and the keyword part is easy, but >> what would be a good solution to query for numbers etc.? >> >> One first idea I had was storing the numbers (which are basically a >> HashMap) in the index in some way or the other. But it is >> not at all obvious for me how to query them then. >> >> Another thing I could think of would be using a separate database of any >> type, but then how to bring those two together in a way that makes sense? >> >> Any pointers to useful resources and any types of hints are welcome! :-) >> >> Best, >> >> Niels >> -- Niels Ott Computational Linguist (B.A.) http://www.drni.de/niels/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org