lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Seltzer <>
Subject Pattern Analyzer
Date Thu, 12 Jul 2012 18:20:38 GMT

I have a search project which uses the Lucene PatternAnalyzer for its
text/query analysis.

At the moment it's configured like so:
analyzer = new PatternAnalyzer(Version.LUCENE_35, Pattern.compile("\\s+"),
true, null);

My goal here was to split words based on spaces and make things case

In thinking about this however I probably want to be a little bit more
sophisticated. I'd like to ignore punctuation which occurs at the end or
beginning of a word.

Is this simply a matter of writing a regex which treats those cases the
same as a space?

Would I use something like this:
analyzer = new PatternAnalyzer(Version.LUCENE_35,
Pattern.compile("\\s+|\\p{Punct}+\\w|\\w\\p{Punct}"), true, null);

Thanks so much!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message