lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bhecht <>
Subject stop words, synonyms... what's in it for me?
Date Mon, 21 May 2007 20:05:02 GMT

Hi there,

I have started using Lucene not long ago, with plans to replace my current
sql queries in my application with it.
As I wasn't aware of Lucene before, I have implemented some similar tools
(filters) as Lucene includes.

For example I have implemented a "stop word" tool.
In my case I have much more configuration options than Lucene, having the
option to remove sub strings in addition to complete tokens.
I can configure the wanted location of the sub string within the token,
or even the location of the token within the phrase.

I have implemented a synonym mechanism (substitution mechanism) that can
also be configured according to location within a phrase. It can also be
configured to find synonyms taking into account spelling mistakes. Although
it doesn't expand but only transforms to one certain replacement.It can find
replacements for sub strings as well. So I can use it to separate words. For
example in German I have "strasse"=> " strasse" (with a space in the front),
so words like "mainstrasse" will be split to "main" and "strasse".

I am wondering if I can use my "standardization" tools before calling the
lucene indexing, without implementing any custom analyzers and achieve more
or less the same results?

What do I "loose" if I go this way? The stemming filters are really one
thing I didn't have and I will use.
Is there any point for me to start creating custom analyzers with filter for
stop words, synonyms, and implementing my own "sub string" filter, for
separating tokens into "sub words" (like "mainstrasse"=> "main", "strasse")

Thanks in advance

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message