lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karsten Konrad" <Karsten.Kon...@xtramind.com>
Subject AW: Analyzers, Queries: three questions
Date Wed, 11 Jun 2003 10:04:32 GMT

Hi,

>>

1) How can I search untokenized fields? Do I have to pass my query 
through a "NullAnalyzer"?
>>

No, the contents of an untokenized (i.e., keyword) field
are stored as one lucene token. Hence, you must build such
a token from your query and build a
TermQuery for being able to search it.

In general, index those fields over which you search unless
you want to treat field contents as identifiers (e.g.,
unique document names or such).

>>

2) How can I pass the value of a field through an Analyzer before 
storing it?
>>

A text field is automatically analyzed and tokenized by the given
analyzer, you do not have to do it "manually". However, you
could preprocess your text in any way you want before that happens - 
simply apply your operations on the content you index, but make sure that
you use a compatible analyzer when searching.

>>

3) How can I fine-tune my query, e.g. by saying that for searching 
within the contents fields I want to pass the query through an Analyzer, 
for searching within the title field, however, I don't want the Analyzer 
pass. And I want a Hit if either field provides it.
>>

Unfortunately, you can not give different analyzers for different fields. 
You could process your query after parsing by traversing and manipulating
the query object; this method requires some programming, though; with
the power of Lucene's default query language, you might end in a lot
of work here.

Regards,

Karsten

-----Urspr√ľngliche Nachricht-----
Von: Ulrich Mayring [mailto:ulim@denic.de]
Gesendet: Mittwoch, 11. Juni 2003 11:50
An: lucene-user@jakarta.apache.org
Betreff: Analyzers, Queries: three questions


Hi folks,

I'm using the Snowball analyzer to index my documents. As an example I 
took the Tomcat documentation, which includes a document with the title 
"Workers HowTo". I put this string in a field called "title", within 
which I later do my query (of course again with the same SnowballAnalyzer).

At first I indexed the field as a Keyword (== not tokenized) and Lucene 
later couldn't find it, when I searched for "Workers HowTo". I found out 
that tokenization apparently includes application of the Analyzer, so if 
I put my query through an Analyzer, then the field to search must be 
tokenized. Hence my first question:

1) How can I search untokenized fields? Do I have to pass my query 
through a "NullAnalyzer"?

Next I made the title field a Text field, so it is tokenized. Now Lucene 
finds the document, but with a low score of 0.27. Sure enough, browsing 
the index showed me that the value of the title field is stored 
"unanalyzed", i.e. "Workers HowTo" - exactly as retrieved from the 
document. On the other hand, after parsing the query, the query is 
actually transformed to "(title:worker title:howto)". This does of 
course not give an exact match, therefore I guess the low score and my 
next questions:

2) How can I pass the value of a field through an Analyzer before 
storing it?

3) How can I fine-tune my query, e.g. by saying that for searching 
within the contents fields I want to pass the query through an Analyzer, 
for searching within the title field, however, I don't want the Analyzer 
pass. And I want a Hit if either field provides it.

Currently I'm using the MultiFieldQueryParser, but that only allows one 
Analyzer for all the fields.

Thank you very much in advance for any pointers,

Ulrich



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message