lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TimF <>
Subject Lucene equivalent of SQL DISTINCT for a specific field's "stored values"
Date Fri, 27 Jul 2007 02:50:12 GMT

I have a field called "category".
Sample data for "category:
   Hello World
   Goodbye World
   Foo Bar
   Mad Mad Mad Mad World

It is tokenized and stored in the index. I tokenize the field because I may
want to search on a specific word(s) in a category but not necessarily the
entire category.

However, I also would like to offer a select box in my web application that
gives the end user the distinct list of stored values for the category
field, which they could choose one of to search on.

I have tried what most people recommend in this forum, use
IndexReader.terms("cateogry") and enumerate that list.

However, obviously this returns the list of distinct terms, 
   Hello , World , Goodbye , Foo , Bar , Mad

not the list of distinct stored values,
   Hello World , Goodbye World , Foo Bar , Mad Mad Mad Mad World

I could add another field to the index that is not tokenized and then
enumerate the terms for that new field, but this seems like a hack, and it
would also add size to the index in that I would be duplicating data for the
category for each document.

Any other ideas?
View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message