lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Boris Okner" <b.ok...@rogers.com>
Subject Re: RussianAnalyzer
Date Sun, 25 Aug 2002 03:53:09 GMT
While ICU is a great project,
1) AFAIK there are no such things as stop-words filtering and stemming. Of
course, one might be able to write language-specific transliteration rules
covering these features for ICU(I have no idea how hard it is), but why
Lucene should be relying on ICU(or ICU contributors)?
Lucene users want working analyzers now, so why make them wait for ICU
before it could be practically usable?
2) ICU supports Unicode only, but in reality, vast amount of Cyrillic-based
software still uses (and I dare to say, will use), non-Unicode encodings.
While it's not a big problem to convert to Unicode for indexing/search,
converting back and forth introduces significant inefficiency.

> We can add an Snowball API to Lucene.
Not sure, what that means? Every stemming algorithm in Snowball is described
in terms of Snowball language, but there is no universal stemming API for
all languages.

Boris Okner

----- Original Message -----
From: "Mehran Mehr" <mehran@sharif.edu>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Saturday, August 24, 2002 2:20 PM
Subject: Re: RussianAnalyzer


> -1
>
> I think, there is no need to add analyzers of all languages in the world
> to Lucene Project, We can add an Snowball API to Lucene.
>
> I've wrote a universal (about 30 lines of code) analyser using IBM's ICU.
> I suggest removing German and English analyzers from Lucene :) and replace
> them with this universal analyser.
>
> On Wed, 21 Aug 2002, Doug Cutting wrote:
>
> > This looks great to me.
> >
> > Does anyone object to adding this to Lucene as the package
> > org.apache.lucene.analysis.ru?
> >
> > Doug
> >
> >
> > --
> > To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message