lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Snowball Analyzer and apostrophes
Date Wed, 18 Jun 2008 13:46:02 GMT
This is tricky....

If you strip the apostrophe, you'd get interesting results from O'brien,
depending
upon how you stripped it (i.e. "closed up" the word to Obrien or substituted
a space, e.g. O brien). We've generally had the fewest surprises by closing
up apostrophes (i.e. Obrien, Charlies).

Unfortunately, anything you do will be wrong in some case. You can either
do something simple like the above, or, say, generate a dictionary that you
use. That is, basically keep a record of all the exceptions to your simple
rule
and transform the input before feeding the analyzer.

Personally, though, I'd close up the apostrophe and feed the analyzer. Don't
forget to do the same for the query.

Best
Erick

You know, my job would be a lot easier if English were regularized. Sign my
petition now!

On Tue, Jun 17, 2008 at 5:16 PM, Max Metral <max@artsalliancelabs.com>
wrote:

> So I'm using Snowball Analyzer on a field for business titles.  The
> value "Charlie's Sandwich Shoppe" becomes "charli sandwich shopp".  This
> happens partly because the StandardAnalyzer strips off the apostrophe-s
> entirely, and then the Snowballer takes off the e.  The problem is when
> someone comes in to search for Charlies, without the apostrophe, they
> get no match because in THAT case, Snowballer produces "charl" as the
> term.  Thoughts on best approach for solving this?  Do I expand it to
> become "{charl,charli} sandwich shop"?  Should I strip apostrophe's
> before feeding the beast?
>
>
>
> Thanks
>
> --Max
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message