lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max Lynch <>
Subject Punctuation in Whitespace Analyzer
Date Fri, 03 Jul 2009 16:05:13 GMT
I am having an issue with analyzers.  Right now, when I do a search, I am
searching for a whole name.  For example, if I have a document like this:

"This is the document text.  John Smith is mentioned right here, he is in
the john.  Smith is his last name.  His full name is John Smith."

If I search this document for the phrase "John Smith" I want to get the hits
(I'm using highlighting) only for the full names without punctuation inside
of them.  For example, I don't want "john. Smith" to be highlighted.
However, I DO want to get the hit for "John Smith." with a period or comma
allowed after the *last name* only.

What is the best analyzer to use for this?  Or is there a different way to
approach this?  Right now my whitespace analyzer won't match on the "John
Smith." case, but maybe I just throw in a few more queries to handle
punctuation at the end of the last name?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message