lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: snowball discussion on LUCENE-2285
Date Sat, 27 Feb 2010 21:29:31 GMT
I created LUCENE-2288 for handling the Object[] thingy in SnowballProgram
(and Class[] in Among).

Shai

On Sat, Feb 27, 2010 at 8:48 PM, Robert Muir <rcmuir@gmail.com> wrote:

> Can you open an issue for the new object[]?  its sad about the hungarian
> issue.  I'm inclined to think we should add savoy's and default to it
> instead.  I don't see this as code duplication, as its a different alg.
> Normally just don't spend a lot of effort towards adding alternative
> stemmers, but here it makes sense.
>
> It sounds really exciting if you are able to merge in what you have done in
> the future!
>
> On Feb 27, 2010 1:16 PM, "Shai Erera" <serera@gmail.com> wrote:
>
> Hi Robert, the EMPTY_ARGS stuff is just in SnowballProgram. I didn't touch
> the generated code, besides handling calling deprecated API.
>
> We've actually taken the same approach I think :). In my Analyzer, the user
> passes a Locale to create the proper Analyzer. The analyzer comes
> pre-configured w/ all bunch of filters, like those that handle email tokens
> produced by the tokenizer (or hosts, acronyms and more), character
> normalization, ngram/stemmer filters etc. The StemmerFilter creates the
> proper stemmer based on the language code, and for that I created a
> SnowballWrapper - that allows me to instantiate Arabic/Hebrew or Snowball
> ones. The wrapper is only needed for the stemmer filter instance ...
>
> I have on my TODO checking contrib/analyzers. Unfortunately our legal
> department is very suspicious of everything (guess they wouldn't make good
> legat folks otherwise ;)). If I'll want to use the contrib/analyzers,
> they'll need to scan the code and identify the owners of the various
> analyzers ... That's what's on my TODO - going through the process w/ them
> :).
>
> I personally think that the work you're doing on the analyzers is
> extraordinary, and since I don't have much time maintaining my own package,
> it has fallen a bit behind in terms of Unicode differences and such. I've
> come to appreciate the power of open source long ago - for me it'd be best
> to join forces on this analysis package. I'm sure that will happen one day
> :).
>
> About the Hungarian stemmer - Martin Porter told us that the original (12?)
> stemmers were written by him and so there's no IP issues. The rest were
> contributed by other people. All but the Hun contributor responded w/ their
> rights to contribute the code. It's just the Hun that never responded, even
> though we've sent a couple of emails. That is problematic. When someone
> contributes code to Lucene, he grants the ASF license (forgot the wording
> that's used). That's very reassuring to lawyers, because it doesn't leave
> them too exposed. But there isn't any similar process in Snowball ... I can
> look up the correspondence we've had with Martin Porter to refresh my memory
> on the detailds.
>  Shai
>
> On Sat, Feb 27, 2010 at 5:35 PM, Robert Muir <rcmuir@gmail.com> wrote:
> >
> > i wanted to continue this...
>
>

Mime
View raw message