From pylucene-dev-return-3054-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Sun Jul 21 16:35:53 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id EF5D4180672 for ; Sun, 21 Jul 2019 18:35:52 +0200 (CEST) Received: (qmail 49796 invoked by uid 500); 21 Jul 2019 16:35:52 -0000 Mailing-List: contact pylucene-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pylucene-dev@lucene.apache.org Delivered-To: mailing list pylucene-dev@lucene.apache.org Received: (qmail 49780 invoked by uid 99); 21 Jul 2019 16:35:52 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Jul 2019 16:35:52 +0000 Received: from yuzu (44.126.19.93.rev.sfr.net [93.19.126.44]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 67D8D826E; Sun, 21 Jul 2019 16:35:51 +0000 (UTC) Date: Sun, 21 Jul 2019 09:35:48 -0700 (PDT) From: Andi Vajda X-X-Sender: vajda@yuzu Reply-To: Andi Vajda To: Maciej Gawinecki cc: pylucene-dev@lucene.apache.org Subject: Re: Stempel stemmer ported to Python In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (OSX 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Hi Maciej, On Fri, 19 Jul 2019, Maciej Gawinecki wrote: > I have ported your Stempel stemmer [1] for Polish language from Java > to Python [2]. I know you have also Python wrapper for Lucene > (pyLucene) so I was curious if you would be interested in the native > implementation of a single stemmer? > > It has same accuracy as the original version and only slightly better > performance comparing to the wrapped version (compared with pyjini) > but uses only one language (no need to switch between languages when > debugging) which was quite important in my NLP project. I understand > that it introduces the need to maintain two code bases, though. PyLucene is not a port of Lucene to Python but a Python/C++ wrapper library auto-generated via JCC: http://lucene.apache.org/pylucene/jcc/ Users of PyLucene in fact embed an actual, unchanged, Apache Java Lucene jar file and a JVM into their Python VM. The Stempel stemmer is part of PyLucene already since it is included in the wrapper generation (look for stempel): https://svn.apache.org/repos/asf/lucene/pylucene/tags/pylucene_7_7_1/Makefile Your native port, which I'm sure is valid and useful, thus does not fit with that auto-wrapper model, however. There is little to no maintenance done on PyLucene proper as all its useful code is in Java Lucene and JCC. Adding native Python code to PyLucene would break that no-maintenance convenience. Thank you for thinking of PyLucene for hosting it, though ! Andi.. > > Regards, > Maciej Gawinecki > > > > [1]: https://github.com/apache/lucene-solr/tree/master/lucene/analysis/stempel/src/java/org > [2]:https://github.com/dzieciou/pystempel/tree/feature/1 >