lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: Fix for Japanese SEN morphological analyzer, and moving into Contrib
Date Mon, 12 Oct 2009 18:10:07 GMT
Mark, does this mean Sen will be under the Apache license? (it is currently

On Mon, Oct 12, 2009 at 1:46 PM, Mark Bennett <> wrote:

> Hi folks,
> I've been working to fix the Japanese SEN morphological analyzer, which is
> currently hosted at:
> To review, Japanese doesn't use whitespace for word breaks.  The
> traditional approach to CJK (Chinese, Japanese, Korean) is to use bigram
> character pairs in the index.  While this works to a point, some believe
> that using proper word breaks provides better results.
> The "lucene-ja" glue layer between Lucene and the core SEN library broke in
> May of '09 when a fix was made in Lucene:
> Uwe S. had a very good insight for a quick fix, and I have been cleaning up
> some other issues with the code.  I have also spoken the author Takashi
> Okamoto and he is fine to have this moved from to ASF; I think it
> will be easier for folks to find and use it if it's in ASF.
> I'm not quite ready to submit a patch, but the Wiki suggests emailing the
> list with the idea in advance.  There are some packaging questions I'll
> have, there's actually quite a few parts.  Also, the wiki didn't quite spell
> out the process to get things into contrib, beyond emailing and submitting a
> patch.  I also plan to eventually submit a Solr-specific wrapper to the solr
> dev list, to allow for dynamic config changes to be made from Solr's
> schema.  But since the original code was Lucene based, and it provides the
> broadest reach, I think having it in core Lucene would be a good start.
> Any comments, suggestions, or mentor volunteers?  :-)
> Mark
> --
> Mark Bennett / New Idea Engineering, Inc. /
> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

Robert Muir

View raw message