lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Inquiry on Lucene Stemming
Date Wed, 17 Dec 2008 00:35:14 GMT
I'd ask the client why stemming wouldn't work <G>. I've spent faaaar too
much
time in my life doing useless things "because the client asked". Really, ask
for
the use cases where that is really required and that stemming wouldn't
cover.

But you're right that Lucene doesn't have such a facility or API. How could
it?
Stemming is easy (yeah, right). But at least it's algorithmic. Going from
one word
to all the possible variants (or even worse, all those variants the client
thinks
should be there) usually requires some kind of "expansion map". Particularly
when you want to go from, say, Steve to Stephen.

Best
Erick

On Tue, Dec 16, 2008 at 6:42 PM, Jay Joel Malaluan <exst_jmalaluan@yahoo.com
> wrote:

> Hi Erick,
>
> Well some client inquiries if it's possible to expand such simple words and
> does Lucene have an API for this logic? Because all I read was the stemming
> logic for Lucene was the other way around which is, example "flashing" it
> will be trimmed to the root word "flash" when searched.
>
>
> Regards,
> Jay Malaluan
>
>
>
>
>
> ________________________________
> From: Erick Erickson <erickerickson@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Tuesday, December 16, 2008 10:14:13 PM
> Subject: Re: Inquiry on Lucene Stemming
>
> Why do you want to do this? The reason I ask is that you're
> making each clause very complex.....
>
> For a single term, it's not very complex, but for something like
> ((A AND B) OR (C AND D)) NOT X
>
> expanding A, B, C, D and X to, possibly many terms is...er...ugly.
>
> You could think about ngrams, although I confess I've only seen
> this on the lists, haven't worked with it myself.
>
> If your goal is to be able to search exact match words (i.e. you
> need to find "flash" when exactly "flash" was indexed, not "flashing")
> there are better strategies....
>
> So a bit more explanation of the problem could perhaps generate more
> helpful responses.
>
> Best
> Erick
>
> On Tue, Dec 16, 2008 at 7:18 AM, Jay Joel Malaluan <
> exst_jmalaluan@yahoo.com
> > wrote:
>
> > Hi,
> >
> >
> > Can anyone comment if my understanding of the stemming process in Lucene
> is
> > correct. From my testing using the SnowballAnalyzer, if I passed this
> word
> > "flashing" it will be trimmed to a root word "flash" and this root word
> > ("flash") will be the one searched not the original word "flashing".
> >
> > Is there an API in Lucene or third-party APIs that can do the following,
> I
> > passed the word "flash" instead it will search for "flashing", "flashed",
> > "flashes" etc.?
> >
> >
> > Regards,
> > Jay Malaluan
> >
> >
> >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message