lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 小康 <xiaok...@cnblogs.com>
Subject Re: State / Future of the Lucene.Net Project
Date Thu, 21 Jun 2018 07:32:36 GMT
Hi,
First I am interested in helping out in order to make the word segmentation
functionality production-ready.
I think it is a meanful and changeing thing that make ICU4N into a general
library.
I want to try the #2 first. And if possible , I want to help making ICU4N
into a general library  with option #3.

Thanks,
xiaokang (SilentCC)


2018-06-21 13:56 GMT+08:00 Shad Storhaug <shad@shadstorhaug.com>:

> Hi,
>
> Actually, there is already an optimized Chinese word segmentation tool in
> the Lucene.Net.ICU project (https://lucene.apache.org/
> core/4_8_0/analyzers-icu/index.html), which is still a work in progress.
> We have Lucene.Net.ICU 100% ported with all tests passing (see
> https://github.com/NightOwl888/lucenenet/tree/icu4n-migration), but we
> could definitely use some help getting the dependent ICU functionality
> finished.
>
> There are still many undecided issues regarding the ICU functionality. For
> example:
>
> 1. Should we use the newly ported ICU4N (https://github.com/
> NightOwl888/ICU4N) project or try to add the functionality to the already
> existing icu.net project (https://github.com/sillsdev/icu-dotnet)? Note
> the latter has been attempted, but there are several issues (missing
> functionality, incompatibilities, problems loading data) that make it very
> challenging to provide all of the Lucene.Net.ICU functionality - it was
> easier to get it working by porting from ICU4J, but will require
> maintaining the ICU4N project.
> 2. If we use ICU4N, should we make it into a general library that benefits
> all of the .NET ecosystem, or should we limit it to primarily support
> Lucene.NET?
> 3. If we use ICU4N, how should we best allow the user to load a customized
> version of the ICU data?
>
> If we make ICU4N into a general library, it would probably be best to
> contribute it back to the ICU project: http://site.icu-project.org/ so it
> is maintained and released on the same schedule and documented there, too.
> Do note that ICU releases very often to keep up with the changes to the
> Unicode standard - we have ported ICU4J from version 60.1 (released
> November 1, 2017) and they just released version 62.1 yesterday (June 20,
> 2018). So one of the first orders of business would be to upgrade the
> existing ICU4N features to version 62.1 if we go that route.
>
> Also note that we only have about 40% of ICU4J ported, which is just
> enough to support Lucene.Net.ICU. There are several APIs that still need to
> be refactored to fit into the .NET paradigm, as well as some gaps in
> functionality to work out before proceeding with any more porting work.
>
> My hope was to make ICU4N into a first rate .NET component to add complete
> Unicode support to the .NET framework with fully .NET like APIs, however we
> also have the option of limiting the scope of the project to just what is
> needed to support Lucene.Net.ICU in order to get the 4.8.0 release done
> quicker. Either way, there is still work to be done to make the APIs of the
> project consistent if we use ICU4N, and there is quite a bit of missing
> functionality to add to icu.net if we use that instead. Basically, there
> are 3 ways to complete this:
>
> 1. Add the required functionality to the icu.net project in order to
> support the Lucene.Net.ICU features, port the missing Lucene.Net.ICU
> features to the current master branch and abandon work on ICU4N.
> 2. Finish up the API and fix 19 failing tests to make ICU4N good enough to
> support Lucene.Net.ICU without making it into a first-rate component that
> supports all ICU features.
> 3. Contact the ICU team about contributing ICU4N to their repository and
> if they agree, allow them to lead the direction of the API and features
> (with the added possibility of their help and Unicode expertise).
>
> #1 would be the least maintenance long-term solution, but I have doubts we
> can get more than about 50% of the Lucene.Net.ICU features to function if
> we go that route. Failing that, the preference is to go with option #3 so
> the whole .NET ecosystem benefits (and contributes) and we will be able to
> release 100% of the Lucene.Net.ICU functionality. Would you be interested
> in helping out in order to make the word segmentation functionality
> production-ready, and if so, for which of these options?
>
> Let me know, and I will start putting together a prioritized list of items
> that are incomplete to get you started.
>
> Thanks,
> Shad Storhaug (NightOwl888)
>
>
> -----Original Message-----
> From: 小康 [mailto:xiaokang@cnblogs.com]
> Sent: Thursday, June 21, 2018 9:00 AM
> To: user@lucenenet.apache.org
> Cc: dev@lucenenet.apache.org
> Subject: Re: State / Future of the Lucene.Net Project
>
> I want to add a  Chinese word segmentation tool with good performance in
> lucenenet.
>
> I think this will be kind to Chinese developers.
>
> Can I do this job?
>
> 2018-06-21 5:19 GMT+08:00 Shad Storhaug <shad@shadstorhaug.com>:
>
> > Hello. Thanks for the heads up. For code optimizations, you will need to
> > locate the areas that need fixing, patch them, and then submit a separate
> > pull request on GitHub for each one. Please provide a small standalone
> > piece of code (a console app works great) we can run before and after the
> > patch to demonstrate exactly how the fix affects performance.
> >
> > We will definitely welcome the help.
> >
> > -----Original Message-----
> > From: 小康 [mailto:xiaokang@cnblogs.com]
> > Sent: Wednesday, June 20, 2018 8:03 PM
> > To: user@lucenenet.apache.org
> > Cc: dev@lucenenet.apache.org
> > Subject: Re: State / Future of the Lucene.Net Project
> >
> > I am willing to contribute to lucene.net .Because I am creating a
> vertical
> > search engine with lucene.net.
> >
> > I want to make lucene.net  faster and better.
> >
> > I can do some contibution on weekends.
> >
> > Thank you.
> >
> > 2018-05-28 23:48 GMT+08:00 Stefan Bodewig <bodewig@apache.org>:
> >
> > > Hi all
> > >
> > > it is pretty difficult to write a message like this. I've been one of
> > > Lucene.Net's mentors during Apache incubation and even though I never
> > > contributed anything significant (at least code-wise) I really care for
> > > the project and its community.
> > >
> > > For more than a year Shad has been the only committer who actually
> > > committed to the code base but despite his herculean effort we haven't
> > > been able to attract new contributors.
> > >
> > > Of the project management committee most people seem to be absent by
> now
> > > and the project has rightfully raised concerns by the board [1][2]
> > >
> > > There really are only two options.
> > >
> > > * we create a credible plan how to get Lucene.Net back into a healthy
> > >   state with multiple contributors and a more active PMC and execute on
> > >   it
> > >
> > > * we start the process of sending the project to the Apache Attic
> > >   http://attic.apache.org/ (which is not a one-way road, projects ca
> be
> > >   re-surrected if a new community emerges).
> > >
> > > We probably should start with trying the first option. We have tried to
> > > find new contributors in the past but haven't been succeful, let's give
> > > it one more try.
> > >
> > > What we need are people who are willing to contribute for more than a
> > > single pull request or two and who are willing to become members of the
> > > developer community here at Apache. If you think this description fits
> > > you, please raise your hand :-)
> > >
> > > Stefan
> > >
> > > [1] https://lists.apache.org/thread.html/
> c44ef94020271b3823fe356a255d69
> > > 3a76287c1214743dfc074621de@%3Cdev.lucenenet.apache.org%3E
> > > [2] https://lists.apache.org/thread.html/
> 70a34c2cd3298afe02827c219e2dc2
> > > b66ae594aabcbaa33265301a44@%3Cdev.lucenenet.apache.org%3E
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message