lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Braun <mbr...@uni-hd.de>
Subject Indexing Dash concatenated words vs SynonymAnalyzer
Date Tue, 20 Jun 2006 13:03:48 GMT
Hello all,


german words are often dash-concatenated, e.g. West-Berlin or something
like "C*-algebras and W*-algebras".

I tend to write my own analyzer like the SynonymAnalyzer from the
LIA-Book. I want to Index these words like this:

West-Berlin => Westberlin | West | Berlin | "West Berlin"
C*-algebras => c| algebra | calgebra

The difference to the SynonymAnalyzer will be that one word will be
separated in to two words. So that it is not a Synonym like  quick <=>
fast, but something like quick <=> "lightning fast".

Is it possible to get two words as a synonym at the same increment
position during indexing? What will happen with a phrase search?

Does anybody know existing approaches for this?

thanks in advance,

Martin



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message