lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Some questions about index...
Date Sat, 05 Feb 2005 20:43:47 GMT

On Feb 5, 2005, at 10:04 AM, Karl Koch wrote:
> 1) Can I store all the information of the text file, but also apply a
> analyser. E.g. I use the StopAnalyzer. After finding the document, I 
> want to
> extract the original text also from the index. Does this require that I
> store the information twice in two different fields (one indexed and 
> one
> unindexed) ?

You should use a single stored, tokenized, and indexed field for this 
purpose.  Be cautious of how you construct the Field object to achieve 

> 2) I would like to extract information from the index which can found 
> in a
> boolean way. I know that Lucene is a VSM which provides Boolean 
> operators.
> This however does not change its functioning. For example, I have a 
> field
> with contains an ID number and I want to use the search like a database
> operatation (e.g. to find the document with id=1). I can solve the 
> problem
> by searching with query "id:1". However, this does not ensure that I 
> will
> only get one result. Usually the first result is the document I want. 
> But it
> could happen, that this sometimes does not work.

Why wouldn't it work?  For ID-type fields, use a Field.Keyword (stored, 
indexed, but not tokenized).  Search for a specific ID using a 
TermQuery (don't use QueryParser for this, please).  If the ID values 
are unique, you'll either get zero or one result.

>  What happens if I should
> get no results? I guess if I search for id=5 and 5 did not exist I 
> would
> probably get 50, 51, .. just because the contain 5. Did somebody work 
> with
> this and can suggest a stable solution?

No, this would not be the case, unless you're analyzing the ID field 
with some strange character-by-character analyzer or doing a wildcard 
"*5*" type query.

> A good solution for these two questions would help me avoiding a 
> database
> which would need to replicate most the data which I already have in my
> Lucene index...

You're on the right track and avoiding a database when it is overkill 
or duplicative is commendable :)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message