lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Seltzer <dselt...@tveyes.com>
Subject Pattern Analyzer
Date Thu, 12 Jul 2012 18:20:38 GMT
Hello,

I have a search project which uses the Lucene PatternAnalyzer for its
text/query analysis.

At the moment it's configured like so:
analyzer = new PatternAnalyzer(Version.LUCENE_35, Pattern.compile("\\s+"),
true, null);

My goal here was to split words based on spaces and make things case
insensitive.

In thinking about this however I probably want to be a little bit more
sophisticated. I'd like to ignore punctuation which occurs at the end or
beginning of a word.

Is this simply a matter of writing a regex which treats those cases the
same as a space?

Would I use something like this:
analyzer = new PatternAnalyzer(Version.LUCENE_35,
Pattern.compile("\\s+|\\p{Punct}+\\w|\\w\\p{Punct}"), true, null);

Thanks so much!

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message