lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Hotm. Nørregaard" <nga...@hotmail.com>
Subject Re: Hungarian notation analyzer and phrase queries
Date Wed, 13 Apr 2005 08:04:59 GMT
Chris wrote
>...As Erik points out in that thread, when dealing with a dictionary of
>"singleword" => ["multi" "word"], and ["multi" "word"] => "singleword"
>synonyms a very good/simple approach is to use an analyzer that allways
>normalizes down to the single word version (as a single token)
>
>This allows you to leave the position incriment alone, and get Span/Phrase
>queries to work just fine.

But be aware of the drawbacks from replacing multiple words with its  
corresponding acryonym/alias during analysis. The analyzis at search-time 
cannot perform the normalization to a single word if the rest of the words 
in the search string are not recognized:

- Prefix search: [cyber] [ca*] would not match [cybercafe]
- Wildcard search: [cyber] [ca?e] would not match [cybercafe]
- Fuzzy search: [cyber] [cage~] would not match [cybercafe]

I am not sure that this is correct but perhaps someone in this forum can.

/Peter

_________________________________________________________________
Del din verden med MSN Spaces  http://spaces.msn.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message