lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steinert, Fabian" <Fabian.Stein...@fokus.fraunhofer.de>
Subject RE: Exact field searches
Date Tue, 31 Jul 2007 10:02:02 GMT
Hi Vijay,

with a frequent usage pattern of searching (exactly) for a whole fields
value (e.g. the whole name) it may be worth to store that field (name:)
twice:

1) as field name_tokenized: with Field.Index.TOKENIZED for normal
"contains" querys and

2) as field name_untokenized: with Field.Index.UN_TOKENIZED for matching
complete content

>From the usage (user selects...) you will know which field to search.

Btw. as long as you want *exact* matches consider use of TermQuery
instead of PrefixQuery as a PrefixQuery for "Pink*" will of course yield
"Pinky" as well.

Best regards,
Fabian


-----Original Message-----
From: Vijay Santhanam [mailto:vijay@spectrumwired.com] 
Sent: Dienstag, 31. Juli 2007 10:23
To: java-user@lucene.apache.org
Subject: Exact field searches

Hi Guys,

Currently I construct a PrefixQuery to exact search through an index of
documents that represent Compact Discs, something like www.discogs.com.

On the search page, we offer a suggestion list as the user enters text,
like
google suggest.
When a user selects an item out of this list, we mark the search as
being an
"exact" search, because they know what they want.

An exact search wraps the name of the disc in a PrefixQuery and performs
the
search.

But, I'm getting some unwanted results and I'm not sure which solution
approach to use.

In our dataset, there are hundreds of CDs with single English word
titles.
Like, "Pink" and "Dust" and "Walk" etc.

If the user selects the "Pink" from the suggestion list, then CDs with
titles like "Pink Sunset", "A Pink lady", "Pink McPinkington", "Tomorrow
the
Pink" appear in the results (along with the CDs just titled "Pink").

Obviously, the PhraseQuery finds instances of that phrase in the title
field, but I need to somehow exclude those titles that have a different
number of tokens from the query.

How do I make search for a specific number of tokens in a field?

Thanks for your help,

Vijay Santhanam
B.Eng.(Soft.)
Spectrum Wired - Software Engineer

T: +61 2 4925 3266
F: +61 2 4925 3255
M: +61 407 525 087
W: www.spectrumwired.com 





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message