lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mufaddal Khumri <>
Subject Re: exact match ..
Date Mon, 20 Feb 2006 18:02:27 GMT
Hi Steve,

If I understand you right, I could use something like the Keyword 
analyzer to tokenize the entire stream as a single token and store that 
in the index. I could definitely the keyword analyzer while indexing 
this particular field "categoryNames".

Now my questions is on how to search and boost this since this is part 
of a bigger boolean query in my case.

My typical query actually looks like:

+(+content:digit +content:camera) +entity:product +(title:"digit 
camera"~2^40.0 ((title:digit title:camera)^10.0) content:"digit 
camera"~2^20.0 (content:digit content:camera) categoryNames:"digit 

As you can see i was trying to do a phrase query on the categoryNames 
field and boosting it by 80.0.
Also I am using the potter stemming filter to stem while searching. (I 
do this while indexing as well). If I go with the KeywordAnalyzer 
approach I can index the categoryNames field using this analyzer .

Would I be using the QueryParser to create my query and specify the 
keyword analyzer to it while searching on categoryNames ? (and then make 
that query part of my global boolean query?)


Steven Rowe wrote:

> Mufaddal Khumri wrote:
>> lets say i do this while indexing:
>> doc.add(Field.Text("categoryNames", categoryNames));
>> Now while searching categoryNames, I do a search for "digital 
>> cameras". I only want to match the exact phrase digital cameras with 
>> documents who have exactly the phrase "digital cameras" in the 
>> categoryNames field. I do not want results that have "digital camera 
>> batteries" part of the result.
>> Whats the best way to accomplish this?
> Hi Mufaddal,
> One way to do this is to use the KeywordAnalyzer (in the Lucene 
> Subversion trunk, but not in v1.4.3; will be in forthcoming v1.9) for 
> the "categoryNames" field.  This analyzer does not tokenize field 
> contents, so "digital cameras" would be a single token, and the only 
> thing that would match it would be the exact same single token.  Be 
> careful when you search to construct the search tokens similarly.
> If you have other fields you want to search, and you want to tokenize 
> their contents when you index them, you could use the 
> PerFieldAnalyzerWrapper, so that the KeywordAnalyzer is only used for 
> the "categoryNames" field.
> Steve
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message