lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: regaridng Reader.terms()
Date Wed, 23 May 2007 13:19:06 GMT
You may have to index things twice, once for searching and once
UN_TOKENIZED for display. Say you have a bunch of service names
you want to display
service one
service two
service three

If you use WhitespaceAnalyzer, TOKENIZED you index the tokens
service  (note, there are three of these)

and re-assembling them is a pain. If you used UN_TOKENIZED, you'd
service one
service two
service three

as exactly three terms.
Now, simply using TermEnum on that field will return your list. And you
get duplicates even if you indexed "service one" twice.

Another possibility is to index a special document. Remember that not
all documents need to have the same fields. So, say you have a list of all
services. You could index a document with "extra_special_meta_field" that
contained a comma-delimited list of services, and just get *that* document
generate your list. A variant of this is to index the *same* field in your
extra-special doc, like
Document doc = new Document();
doc.add("special", "service one"...UN_TOKENIZED);
doc.add("special", "service two"... UN_TOKENIZED);

Now, once you have the document (perhaps do this at startup time), the
call Document.getValues("special") will return a list like
"service one"
"service two".

At least I'm pretty sure....

All this may be in left field, but I thought I'd mention it.


On 5/23/07, Mohammad Norouzi <> wrote:
> Hi Walter,
> let me explain my problem in detail
> I have a web page let user to create his own query simple
> for example a user want to locate a service with specific value. so he/she
> doesnt know exactly the name of the service so I have to provide a list of
> services available (say in a combo box) and after selecting the service
> now
> the user can enter the value for that service so I will build the
> appropriate query and send it to the lucene.
> but my problem is how to display all services? how can I extract the
> services from the index?
> I was thinking of IndexReader.terms() but I think because of using
> WhitespaceAnalyzer it returns word by word terms, but I need terms per
> document,
> if we consider a SQL , select distinct service_name from table_name
> is there a way to tackle this problem?
> --
> Regards,
> Mohammad
> --------------------------
> see my blog:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message