lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Combining keyword queries with database-style queries
Date Fri, 24 Oct 2008 12:37:29 GMT
Hacky is in the eye of the hacker <G>.

It's hard to keep in mind that Lucene is a search engine,
not a database, so whenever I find myself thinking in
database terms, I'm usually making things difficult. It
operates on strings, not the "usual" data types that one
thinks are available in programming languages, DBs, etc...

So I find myself doing things that "aren't natural" <G>...


On Fri, Oct 24, 2008 at 7:26 AM, Niels Ott <>wrote:

> Erick,
> this RangeQuery thing looks promising. It might be a bit hacky but it will
> most probably do the job in the given time and framework.
> Thanks a lot,
>   Niels
> Erick Erickson schrieb:
>  Well, assuming that token_count is an indexed field
>> in your documents (i.e. not something you're
>> computing on the fly), just use a RangeQuery for the numeric
>> part. Actually, you probably want to use
>> ConstantScoreRangeQuery...
>> The only thing you have to watch is that Lucene does a
>> lexical compare, so you have to index your numbers
>> as comparable strings, probably left-padding to some
>> fixed width with zeros, see NumberTools.
>> Best
>> Erick
>> On Thu, Oct 23, 2008 at 8:27 AM, Niels Ott <
>> >wrote:
>>  Hi everybody,
>>> I need to query for documents not only for search terms but also for
>>> numeric values (or other general types). Let me try to explain with a
>>> hypothetical example.
>>> Assuming there is a value for the number words in each document (or the
>>> number of person names, or whatever), I would want to formulate a query
>>> like "Give me documents containing 'jack johnson' AND with token_count >
>>> 250".
>>> I've been working with Lucene before and the keyword part is easy, but
>>> what would be a good solution to query for numbers etc.?
>>> One first idea I had was storing the numbers (which are basically a
>>> HashMap<String,Double>) in the index in some way or the other. But it is
>>> not at all obvious for me how to query them then.
>>> Another thing I could think of would be using a separate database of any
>>> type, but then how to bring those two together in a way that makes sense?
>>> Any pointers to useful resources and any types of hints are welcome! :-)
>>> Best,
>>>  Niels
> --
> Niels Ott
> Computational Linguist (B.A.)
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message