Return-Path: Delivered-To: apmail-lucene-pylucene-dev-archive@minotaur.apache.org Received: (qmail 65984 invoked from network); 27 Sep 2010 01:58:48 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Sep 2010 01:58:48 -0000 Received: (qmail 96697 invoked by uid 500); 27 Sep 2010 01:58:48 -0000 Delivered-To: apmail-lucene-pylucene-dev-archive@lucene.apache.org Received: (qmail 96665 invoked by uid 500); 27 Sep 2010 01:58:47 -0000 Mailing-List: contact pylucene-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pylucene-dev@lucene.apache.org Delivered-To: mailing list pylucene-dev@lucene.apache.org Received: (qmail 96657 invoked by uid 99); 27 Sep 2010 01:58:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Sep 2010 01:58:47 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [66.159.224.220] (HELO ovaltofu.org) (66.159.224.220) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Sep 2010 01:58:43 +0000 Received: from [192.168.0.11] ([192.168.0.11]) (authenticated bits=0) by ovaltofu.org (8.14.4/8.14.4) with ESMTP id o8R1wIjv023452 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sun, 26 Sep 2010 18:58:18 -0700 (PDT) Date: Sun, 26 Sep 2010 18:59:17 -0700 (PDT) From: Andi Vajda X-X-Sender: vajda@yuzu.local Reply-To: Andi Vajda To: pylucene-dev@lucene.apache.org Subject: Re: Feature request: include collation In-Reply-To: <4C9FF53A.9030605@cheimes.de> Message-ID: References: <4C9FF53A.9030605@cheimes.de> User-Agent: Alpine 2.01 (OSX 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Mon, 27 Sep 2010, Christian Heimes wrote: > I like to request a new feature for the next version of PyLucene. Lucene > already comes with a collation library but PyLucene doesn't wrap it. > Collation is required for language depending sorting of search results. [1] > > I've attached a working patch for the feature request. > >>>> from lucene import * >>>> initVM() > >>>> collator = Collator.getInstance(Locale("de")) >>>> keyanalyzer = CollationKeyAnalyzer(collator) >>>> keyanalyzer > org.apache.lucene.collation.CollationKeyAnalyzer@510dc6b5> > > Thanks > Christian Hi Christian, In 3.x and trunk, I've been porting ICU-dependant Lucene contrib features to use PyICU [1][2] (which depends on C++ ICU). I think that having PyLucene depend both on C++ ICU and Java ICU is one ICU too many :-), though. I'm not sure at this point which should remain. There are advantages to both... I'm open to arguments in favor of either. You can see examples in the 3.x tree [3]. (disclaimer: I'm the author of PyICU) Andi.. [1] http://pypi.python.org/pypi/PyICU [2] http://pyicu.osafoundation.org/ [3] http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x/python/ > > [1] > http://lucene.apache.org/java/3_0_1/api/contrib-collation/org/apache/lucene/collation/package-summary.html > > > Index: Makefile > =================================================================== > --- Makefile (Revision 1001535) > +++ Makefile (Arbeitskopie) > @@ -136,7 +136,9 @@ > REGEX_JAR=$(LUCENE)/build/contrib/regex/lucene-regex-$(LUCENE_VER).jar > QUERIES_JAR=$(LUCENE)/build/contrib/queries/lucene-queries-$(LUCENE_VER).jar > INSTANTIATED_JAR=$(LUCENE)/build/contrib/instantiated/lucene-instantiated-$(LUCENE_VER).jar > +COLLATION_JAR=$(LUCENE)/build/contrib/collation/lucene-collation-$(LUCENE_VER).jar > EXTENSIONS_JAR=build/jar/extensions.jar > +ICU4J_JAR=$(LUCENE)/contrib/collation/lib/icu4j-collation-4.0.jar > > > .PHONY: generate compile install default all clean realclean \ > @@ -185,19 +187,24 @@ > $(INSTANTIATED_JAR): $(LUCENE_JAR) > cd $(LUCENE)/contrib/instantiated; $(ANT) -Dversion=$(LUCENE_VER) > > +$(COLLATION_JAR): $(LUCENE_JAR) > + cd $(LUCENE)/contrib/collation; $(ANT) -Dversion=$(LUCENE_VER) > + > $(EXTENSIONS_JAR): $(LUCENE_JAR) > $(ANT) -f extensions.xml -Dlucene.dir=$(LUCENE) > > JARS=$(LUCENE_JAR) $(SNOWBALL_JAR) $(ANALYZERS_JAR) \ > $(REGEX_JAR) $(MEMORY_JAR) $(HIGHLIGHTER_JAR) \ > - $(QUERIES_JAR) $(INSTANTIATED_JAR) $(EXTENSIONS_JAR) > + $(QUERIES_JAR) $(INSTANTIATED_JAR) $(COLLATION_JAR) \ > + $(EXTENSIONS_JAR) > > -JCCFLAGS?=--no-generics > +JCCFLAGS?=--no-generics --reserved IGNORE > > jars: $(JARS) > > GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) \ > $(JCCFLAGS) \ > + --include $(ICU4J_JAR) \ > --package java.lang java.lang.System \ > java.lang.Runtime \ > --package java.util \ > @@ -206,6 +213,8 @@ > --package java.io java.io.StringReader \ > java.io.InputStreamReader \ > java.io.FileInputStream \ > + --package java.text \ > + java.text.Collator \ > --exclude org.apache.lucene.queryParser.Token \ > --exclude org.apache.lucene.queryParser.TokenMgrError \ > --exclude > org.apache.lucene.queryParser.QueryParserTokenManager \ >