lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miles Barr <>
Subject Re: Collecting documents where only one field term matches
Date Mon, 04 Apr 2005 10:20:33 GMT
On Mon, 2005-04-04 at 09:17 +0000, mad Cow wrote:
> Could some more experienced users suggest a solution to my problem. I have 
> documents which contain multiple terms and phrases, and I wish to collect 
> documents which match only the term I query for.
> For example:
> Doc1 contains,
>    species:"homo sapien" Mammalia
> Doc2 contains,
>    species:"homo sapien"
> I wish to collect documents ONLY with "homo sapien" but a search for 
> species:"homo sapien" returns both documents as they both contain the 
> phrase.
> I have written code to cache every term for every field an I hoped that I 
> could do the search - species:"homo sapien" -species:Mammalia. Unfortunately 
> the terms homo and sapien seem to be separate.  So when I collect every term 
> to use with the "-" operator I end up with a query thus
> species:"homo sapien" -species:(homo Mammalia sapien)
> which isn't the same.
> Can anybody suggest another approach?

If the species are fixed I recommend using the Keyword type:,

and add each species as a separate field (Lucene can handle multiple
fields with the same name). Then the query 'species:"homo sapien"
-species:Mammalia' should work.

But I think the real problem is that you category hierarchy that you
want to filter by, which is awkward to do with Lucene alone. When I come
across these situations I normally pair up Lucene with a database that
holds the categorization information and take one of two approaches:

1. Do the search in Lucene, then do the category filtering against the
database (which holds document/category information). Lucene holds no
category information in this case

2. Take the query, look up the relevant category information in the
database and expand the query so it only picks up the categories you
want (you'd store each category a document is in as a separate Lucene
keyword field).

Miles Barr <>
Runtime Collective Ltd.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message