lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <va...@apache.org>
Subject Re: Feature request: include collation
Date Mon, 27 Sep 2010 01:59:17 GMT

On Mon, 27 Sep 2010, Christian Heimes wrote:

> I like to request a new feature for the next version of PyLucene. Lucene
> already comes with a collation library but PyLucene doesn't wrap it.
> Collation is required for language depending sorting of search results. [1]
>
> I've attached a working patch for the feature request.
>
>>>> from lucene import *
>>>> initVM()
> <jcc.JCCEnv object at 0x7f326926e0d8>
>>>> collator = Collator.getInstance(Locale("de"))
>>>> keyanalyzer = CollationKeyAnalyzer(collator)
>>>> keyanalyzer
> <CollationKeyAnalyzer:
> org.apache.lucene.collation.CollationKeyAnalyzer@510dc6b5>
>
> Thanks
> Christian

   Hi Christian,

In 3.x and trunk, I've been porting ICU-dependant Lucene contrib features to 
use PyICU [1][2] (which depends on C++ ICU). I think that having PyLucene 
depend both on C++ ICU and Java ICU is one ICU too many :-), though.

I'm not sure at this point which should remain. There are advantages to 
both... I'm open to arguments in favor of either. You can see examples in 
the 3.x tree [3].

(disclaimer: I'm the author of PyICU)

Andi..

[1] http://pypi.python.org/pypi/PyICU
[2] http://pyicu.osafoundation.org/
[3] http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x/python/

>
> [1]
> http://lucene.apache.org/java/3_0_1/api/contrib-collation/org/apache/lucene/collation/package-summary.html
>
>
> Index: Makefile
> ===================================================================
> --- Makefile    (Revision 1001535)
> +++ Makefile    (Arbeitskopie)
> @@ -136,7 +136,9 @@
> REGEX_JAR=$(LUCENE)/build/contrib/regex/lucene-regex-$(LUCENE_VER).jar
> QUERIES_JAR=$(LUCENE)/build/contrib/queries/lucene-queries-$(LUCENE_VER).jar
> INSTANTIATED_JAR=$(LUCENE)/build/contrib/instantiated/lucene-instantiated-$(LUCENE_VER).jar
> +COLLATION_JAR=$(LUCENE)/build/contrib/collation/lucene-collation-$(LUCENE_VER).jar
> EXTENSIONS_JAR=build/jar/extensions.jar
> +ICU4J_JAR=$(LUCENE)/contrib/collation/lib/icu4j-collation-4.0.jar
>
>
> .PHONY: generate compile install default all clean realclean \
> @@ -185,19 +187,24 @@
> $(INSTANTIATED_JAR): $(LUCENE_JAR)
>        cd $(LUCENE)/contrib/instantiated; $(ANT) -Dversion=$(LUCENE_VER)
>
> +$(COLLATION_JAR): $(LUCENE_JAR)
> +       cd $(LUCENE)/contrib/collation; $(ANT) -Dversion=$(LUCENE_VER)
> +
> $(EXTENSIONS_JAR): $(LUCENE_JAR)
>        $(ANT) -f extensions.xml -Dlucene.dir=$(LUCENE)
>
> JARS=$(LUCENE_JAR) $(SNOWBALL_JAR) $(ANALYZERS_JAR) \
>      $(REGEX_JAR) $(MEMORY_JAR) $(HIGHLIGHTER_JAR) \
> -     $(QUERIES_JAR) $(INSTANTIATED_JAR) $(EXTENSIONS_JAR)
> +     $(QUERIES_JAR) $(INSTANTIATED_JAR) $(COLLATION_JAR) \
> +     $(EXTENSIONS_JAR)
>
> -JCCFLAGS?=--no-generics
> +JCCFLAGS?=--no-generics --reserved IGNORE
>
> jars: $(JARS)
>
> GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) \
>            $(JCCFLAGS) \
> +           --include $(ICU4J_JAR) \
>            --package java.lang java.lang.System \
>                                java.lang.Runtime \
>            --package java.util \
> @@ -206,6 +213,8 @@
>            --package java.io java.io.StringReader \
>                              java.io.InputStreamReader \
>                              java.io.FileInputStream \
> +           --package java.text \
> +                     java.text.Collator \
>            --exclude org.apache.lucene.queryParser.Token \
>            --exclude org.apache.lucene.queryParser.TokenMgrError \
>            --exclude
> org.apache.lucene.queryParser.QueryParserTokenManager \
>

Mime
View raw message