lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Stemming Problem
Date Wed, 19 May 2010 00:46:57 GMT
You can construct your own analyzer by creating
it from a pre-existing Tokenizer
(e.g. WhiteSpaceTokenizer) and any number
of TokenfFilters (e.g. TokenFilter). You can
string any number of TokenFilters together
to get many different effects.

But I have to ask, why you want to keep capitalization?
and punctuation? Do you really want to fail to match
text indexed with "Erickson, Erick" with the query
"erick erickson"? That's often a source of frustration
instead of goodness.

HTH
Erick

On Tue, May 18, 2010 at 2:05 PM, Larry Hendrix <lahendrix@wisc.edu> wrote:

> Hi,
>
> Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having
> problems with stemming. Does anyone have a recommendation for other text
> analyzers that handle stemming and also keep capitalization, stop words, and
> punctuation?
>
> Thanks,
> Larry
>
>
> Larry A. Hendrix, Graduate Student
> Computer Science Department
> University of Wisconsin-Madison
> 1300 University Ave Rm 6749
> Madison, WI 53711
> Office: (608) 263-7624
> lhendrix@cs.wisc.edu
> Grambling State University Alum
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message