Return-Path:
+ PyLucene is completely code-generated by JCC whose sources are
+ included with the PyLucene sources.
+
+ Before building PyLucene, JCC must be built first. See
+ JCC's installation
+ instructions for building and installing it.
+
+ Once JCC is built and installed, PyLucene is built via a Makefile which
+ invokes JCC. See PyLucene's Makefile for configuration instructions.
+
+ There are limits to both how many files can fit on the command line and
+ how large a C++ file the C++ compiler can handle.
+ By default, JCC generates one large C++ file containing the source code
+ for all wrapper classes.
+
+ Using the --files command line argument, this behaviour can be tuned to
+ workaround various limits, for example:
+
+ To build PyLucene a Java Development Kit (JDK)
+ and Ant are required; use of the
+ resulting PyLucene binaries requires only a Java Runtime Environment
+ (JRE).
+
+ The setuptools
+ package is required to build and run PyLucene on Python 2.3.5. With
+ later versions of Python, setuptools is only required for shared
+ mode. See JCC's installation
+ instructions for more information.
+
+ PyLucene's Makefile is a GNU Makefile. Be sure to
+ use
+ Just as when building JCC, Python's distutils must be nudged a bit to
+ invoke the correct compiler. Sun Studio's C compiler is
+ called
+ To build PyLucene, use the following shell command to ensure that
+ the C++ compiler is used:
+ PyLucene is a Python extension built with
+ JCC.
+
+ To build PyLucene, JCC needs to be built first. Sources for JCC are
+ included with the PyLucene sources. Instructions for building and
+ installing JCC are here.
+
+ Instruction for building PyLucene
+ are here.
+
+ PyLucene is closely tracking Java Lucene releases. It intends to
+ supports the entire Lucene API.
+
+ PyLucene also includes a number of Lucene contrib packages: the
+ Snowball analyzer and stemmers, the highlighter package, analyzers
+ for other languages than english, regular expression queries,
+ specialized queries such as 'more like this' and more.
+
+ This document only covers the pythonic extensions to Lucene offered
+ by PyLucene as well as some differences between the Java and Python
+ APIs. For the documentation on Java Lucene APIs,
+ see here.
+
+ To help with debugging and to support some Lucene APIs, PyLucene also
+ exposes some Java runtime APIs.
+
+ The best way to learn PyLucene is to look at the many samples
+ included with the PyLucene source release or on the web at:
+
+ A large number of samples are shipped with PyLucene. Most notably,
+ all the samples published in
+ the Lucene in
+ Action book that did not depend on a third party Java
+ library for which there was no obvious Python equivalent were
+ ported to Python and PyLucene.
+
+ Lucene in Action is a great companion to learning
+ Lucene. Having all the samples available in Python should make it
+ even easier for Python developers.
+
+ Lucene in Action was written by Erik Hatcher and Otis
+ Gospodnetic, both part of the Java Lucene development team, and is
+ available from
+ Manning Publications.
+
+ Before PyLucene APIs can be used from a thread other than the main
+ thread that was not created by the Java Runtime, the
+
+ Java exceptions are caught at the language barrier and reported to
+ Python by raising a JavaError instance whose args tuple contains the
+ actual Java Exception instance.
+
+ Java arrays are returned to Python in a
+ A few Lucene APIs take array arguments and expect values to be
+ returned in them. To call such an API and be able to retrieve the
+ array values after the call, a Java array needs to instantiated
+ first.
+ In addition to
+ To convert a char or byte array to a Python string use a
+
+ Instead of an integer denoting the size of the desired Java array,
+ a sequence of objects of the expected element type may be passed
+ in to the array constructor.
+ All methods that expect an array also accept a sequence of Python
+ objects of the expected element type. If no values are expected
+ from the array arguments after the call, it is hence not necessary
+ to instantiate a Java array to make such calls.
+
+ See JCC for more
+ information about handling arrays.
+
+ Java is a very verbose language. Python, on the other hand, offers
+ many syntactically attractive constructs for iteration, property
+ access, etc... As the Java Lucene samples from the Lucene in
+ Action book were ported to Python, PyLucene received a number
+ of pythonic extensions listed here:
+
+ Many areas of the Lucene API expect the programmer to provide
+ their own implementation or specialization of a feature where
+ the default is inappropriate. For example, text analyzers and
+ tokenizers are an area where many parameters and environmental
+ or cultural factors are calling for customization.
+
+ PyLucene enables this by providing Java extension points listed
+ below that serve as proxies for Java to call back into the
+ Python implementations of these customizations.
+
+ These extension points are simple Java classes that JCC
+ generates the native C++ implementations for. It is easy to add
+ more such extensions classes into the 'java' directory of the
+ PyLucene source tree.
+
+ To learn more about this topic, please refer to the JCC
+ documentation.
+
+ Please refer to the classes in the 'java' tree for currently
+ available extension points. Examples of uses of these extension
+ points are to be found in PyLucene's unit tests and Lucene
+ in
+ Action samples.
+
- PyLucene is a Python extension for accessing Java Lucene. Its goal is to allow you to use Lucene's text indexing and searching capabilities from Python. It is API compatible with the latest version of Java Lucene, version 2.4.0 as of October 19th, 2008.
+ PyLucene is a Python extension
+ for accessing
+ Java Lucene. Its goal
+ is to allow you to use Lucene's text indexing and searching
+ capabilities from Python. It is API compatible with the latest
+ version of Java Lucene, version 2.4.0 as of October 19th, 2008.
- PyLucene is built with JCC, a C++ code generator that makes it possible to call into Java classes from Python via Java's Native Invocation Interface (JNI). Sources for JCC are included with the PyLucene sources.
+ PyLucene is not a Lucene port but a Python wrapper around
+ Java Lucene. PyLucene embeds a Java VM with Lucene into a Python
+ process. The PyLucene Python extension, a Python module called
+
- See the README file for more information and documentation about PyLucene.
+ PyLucene is built with JCC, a C++
+ code generator that makes it possible to call into Java classes from
+ Python via Java's Native Invocation Interface (JNI). Sources for JCC
+ are included with the PyLucene sources.
+
+ See here for more
+ information and documentation about PyLucene.
+
+
+ --files 2
+
+ --files 10
+
+ --files separate
+ gmake
instead of plain make
.
+ cc
while its C++ compiler is
+ called CC
.
+
+
+ $ CC=CC gmake
+
+ initVM(classpath, ...)
. More about this function
+ in here.
+
+
+ attachCurrentThread()
method must be called on the
+ JCCEnv
object returned by the initVM()
+ or getVMEnv()
functions.
+ JArray
+ wrapper instance that implements the Python sequence protocol. It
+ is possible to change array elements but not to change the array
+ size.
+
+ For example, accessing termDocs:
+ 'int'
, the 'JArray'
+ function accepts 'object'
, 'string'
,
+ 'bool'
, 'byte'
, 'char'
,
+ 'double'
, 'float'
, 'long'
+ and 'short'
to create an array of the corresponding
+ type. The JArray('object')
constructor takes a second
+ argument denoting the class of the object elements. This argument
+ is optional and defaults to Object.
+ ''.join(array)
construct.
+
+ For example:
+
+
+ import
+ org.apache.lucene.index.IndexReader;
corresponds to the
+ Python import statement from lucene import
+ IndexReader
+
+
+
+ The Java loop:
+
+ can be written in Python:
+
+ if hit.iterator()'s next() method were declared to return
+ Hit
instead of Object
, the above
+ cast_() call would not be unnecessary.
+ The same java loop can also be written:
+
+
+ The Java expressions:
+
+ are better written in Python:
+
+
+ The Java expression:
+
+ is better written in Python:
+
+
+ The Java loop:
+
+ is better written in Python:
+
+ Once JCC heeds Java 1.5 annotations and once Java Lucene
+ makes use of them, such casting should become unncessary.
+ lucene
, is machine-generated by JCC.
- PyLucene requires Python version 2.x (x >= 3.5) and Java version 1.x (x >= 4). Building PyLucene requires GNU Make, a recent version of Ant capable of building Java Lucene and a C++ compiler. Use of setuptools is recommended. -
-- See the JCC INSTALL file for more information about building JCC from sources. -
-- See the PyLucene INSTALL file for more information about building PyLucene from sources. + PyLucene requires Python version 2.x (x >= 3.5) and Java version 1.x + (x >= 4). Building PyLucene requires GNU Make, a recent version + of Ant capable of building + Java Lucene and a C++ + compiler. Use + of setuptools + is recommended. +
++ See the JCC installation + instructions for more information about building JCC from sources. +
++ See the PyLucene installation + instructions for more information about building PyLucene from + sources.
- The Lucene PMC is pleased to announce the arrival of PyLucene as a Lucene subproject. PyLucene was previously hosted at the Open Source Applications Foundation since its inception in early 2004. + The Lucene PMC is pleased to announce the arrival of PyLucene as a + Lucene subproject. PyLucene was previously hosted at the Open + Source Applications Foundation since its inception in early 2004.
+ JCC is a Python extension written in Python and C++. It requires a
+ Java Runtime Environment to operate as it uses Java's reflection
+ APIs to do its work. It is built and installed
+ via distutils
+ or setuptools.
+
setup.py
and review that values in
+ the INCLUDE
, CFLAGS
,
+ DEBUG_CFLAGS
, LFLAGS
+ and JAVAC
are correct for your system. These values
+ are also going to be compiled into JCC's config.py
+ file and are going to be used by JCC when
+ invoking distutils
or setuptools
to
+ compile extensions its generating code for.
+
+ JCC requires a Java Development Kit to be present. It uses the Java
+ Native Invocation Interface and expects <jni.h>
+ and the Java libraries to be present at build and runtime.
+
+ JCC requires a C++ compiler. A recent C++ compiler for your + platform is expected to work as expected. +
+
+ On Mac OS X, Java is installed by Apple's setup as a framework. The
+ values for INCLUDE
and LFLAGS
+ for darwin
should be correct and ready to use.
+
+ JCC has been built and tested on a variety of Linux distributions,
+ 32- and 64-bit. Getting the java configuration correct is important
+ and is done differently for every distribution.
+ For example:
+
java-config
utility should be used to
+ locate, and possibly change, the default java installation.
+ The sample flags for Linux in JCC's setup.py
should
+ be changed to reflect the root of the Java installation which may
+ be obtained via:
+
+ + See earlier section about Shared Mode for + Linux support. +
++ At this time, JCC has been built and tested only on Solaris 11 with Sun + Studio C++ 12, Java 1.6 and Python 2.4. +
+
+ Because JCC is written in C++, Python's distutils
must
+ be nudged a bit to invoke the correct compiler. Sun Studio's C
+ compiler is called cc
while its C++ compiler is
+ called CC
. To build JCC, use the following shell
+ command to ensure that the C++ compiler is used:
+
+ Shared mode is not currently implemented for
+ Solaris, setuptools
needs to be taught how to build
+ plain shared libraries on Solaris first.
+
+ At this time, JCC has been built and tested on Win2k and WinXP with + a variety of Python and Java versions. +
+PATH
is a must.
+ javac.exe
+ to PATH
is required for shared mode (enabled by
+ default if setuptools >= 0.6c7
is found to be
+ installed).
+ + To use JCC with Python 2.3, setuptools is required +
+setuptools
egg file to use
+ python2.3 instead of python2.4.
+ initVM(classpath, ...)
. More about this function
+ in here.
+
+ JCC is a Python extension written in Python and C++. It requires a
+ Java Runtime Environment (JRE) to operate as it uses Java's
+ reflection APIs to do its work. It is built and installed
+ via distutils
or setuptools
.
+
+ See here for more + information and operating system specific notes. +
++ JCC started as a C++ code generator for hiding the gory details of + accessing methods and fields on Java classes via + Java's Native Invocation Interface. + These C++ wrappers make it possible to access a Java object as if it + was a regular C++ object very much like GCJ's + CNI + interface. +
++ It then became apparent that JCC could also generate the C++ + wrappers for making these classes available to Python. Every class + that gets thus wrapped becomes a + CPython + type. +
+
+ JCC generates wrappers for all public classes that are requested by
+ name on the command line or via the --jar
command line
+ argument. It generates wrapper methods for all public methods and
+ fields on these classes whose types are found in one of the
+ following ways:
+
--package
command line argument
+ + JCC does not generate wrappers for methods or fields which don't + satisfy these requirements. Thus, JCC can avoid generating code for + runaway transitive closures of type dependencies. +
+
+ JCC generates property accessors for a property
+ called field
when it finds Java methods
+ named set
Field
(value)
,
+ get
Field
()
or
+ is
Field
()
.
+
+ The C++ wrappers are declared in a C++ namespace structure that + mirrors the Java classes' Java packages. The Python types are + declared in a flat namespace at the top level of the resulting + Python extension module. +
++ JCC's command-line arguments are best illustrated via the PyLucene + example: +
+ ++ There are limits to both how many files can fit on the command line + and how large a C++ file the C++ compiler can handle. By default, + JCC generates one large C++ file containing the source code for all + wrapper classes. +
+
+ Using the --files
command line argument, this behaviour
+ can be tuned to workaround various limits:
+ for example:
+
--files 2
+ --files 10
+ --files separate
+
+ The --prefix
and --root
arguments are
+ passed through to distutils
' setup()
.
+
+ When generating wrappers for Python, the JAR files passed to JCC
+ via --jar
are copied into the resulting Python extension
+ as resources and added to the extension's CLASSPATH
+ variable. Classes or JAR files that are required by the classes
+ contained in the argument JAR files need to be made findable via
+ JCC's --classpath
command line argument. At runtime,
+ these need to be appended to the extension's CLASSPATH
+ variable before starting the VM with initVM(CLASSPATH)
.
+
+ To have more jar files automatically copied into resulting python
+ extension and added to the classpath at build and runtime, use
+ the --include
option. This option works like
+ the --jar
option except that no wrappers are generated
+ for the public classes contained in them unless they're explicitely
+ named on the command line.
+
distutils
vs setuptools
+ By default, when building a Python extension,
+ if setuptools
is found to be installed, it is used
+ over distutils
. If you want to force the use
+ of distutils
over setuptools
, use
+ the --use-distutils
command line argument.
+
+ The --bdist
option can be used to ask JCC to
+ invoke distutils
with bdist
+ or setuptools
+ with bdist_egg
. If setuptools
is used,
+ the resulting egg has to be installed with the
+ easy_install
+ installer which is normally part of a Python installation that
+ includes setuptools
.
+
+ JCC includes a small runtime component that is compiled into any + Python extension it produces. +
++ This runtime component makes it possible to manage the Java VM from + Python. Because a Java VM can be configured with a myriad of + options, it is not automatically started when the resulting Python + extension module is loaded into the Python interpreter. +
+
+ Instead, the initVM()
function must be called from the
+ main thread before using any of the wrapped classes. It takes the
+ following keyword arguments:
+
classpath
CLASSPATH
variable that is hardcoded to
+ the jar files that it was produced from. A copy of each jar file
+ is installed as a resources files along with the extension when
+ JCC is invoked with the --install
command line
+ argument. For example:
+
+ initialheap
-Xms
java command line argument. For example:
+
+ maxheap
-Xmx
java command line argument.
+ maxstack
-Xss
java command line argument.
+ vmargs
+ The initVM()
and getVMEnv()
functions
+ return a JCCEnv object that has a few utility methods on it:
+
attachCurrentThread(name, asDaemon)
detachCurrentThread()
+ The opposite of attachCurrentThread()
. This method
+ should be used with extreme caution as Python's and java VM's
+ garbage collectors may use a thread detached too early causing a
+ system crash. The utility of this method seems dubious at the
+ moment.
+
+ There are several differences between JNI's findClass()
+ and Java's Class.forName()
:
+
findClass()
may find
+ classes that Class.forName()
won't.
+ + For example: +
+ +
+ Many Java APIs are declared to return types that are less specific
+ than the types actually returned. In Java 1.5, this is worked around
+ with annotations. JCC does not heed annotations at the moment. A
+ Java API declared to return Object
will wrap objects as
+ such.
+
+ In C++, casting the object into its actual type is supported via the + regular C casting operator. +
+
+ In Python each wrapped class has a class method
+ called cast_
that implements the same functionality.
+
+ Similarly, each wrapped class has a class method
+ called instance_
that tests whether the wrapped java
+ instance is of the given type. For example:
+
+ Java arrays are wrapped with a C++ JArray
+ template. The []
is available for read
+ access. This template, JArray<T>
, accomodates all
+ java primitive types, jstring
, jobject
and
+ wrapper class arrays.
+
+ Java arrays are returned to Python in a JArray
wrapper
+ instance that implements the Python sequence protocol. It is
+ possible to change an array's elements but not to change an array's
+ size.
+
+ To convert a char or byte array to a Python string use
+ a ''.join(array)
construct.
+
+ Any Java method expecting an array can be called with the corresponding + sequence object from python. +
++ To instantiate a Java array from Python, use one of the following + forms: +
+ +
+ Instead of 'int'
, you may also use one
+ of 'object'
, 'string'
, 'bool'
,
+ 'byte'
, 'char'
, 'double'
,
+ 'float'
, 'long'
and 'short'
+ to create an array of the corresponding type.
+
+ Because there is only one wrapper class for object arrays,
+ the JArray('object')
type's constructor takes a second
+ argument denoting the class of the object elements. This argument is
+ optional and defaults to Object
.
+
+ As with the Object
types, the JArray
types
+ also include a cast_
method. This method becomes useful
+ when the array returned to Python is wrapped as a
+ plain Object
. This is the case, for example, with
+ nested arrays since there is no distinct Python type for every
+ different java object array class - all java object arrays are
+ wrapped by JArray('object')
. For example:
+
+ In both cases, the java type of obj must be compatible with the + array type it is being cast to. +
+ +
+ To verify that a Java object is of a given array type, use
+ the instance_()
method available on the array
+ type. This is not the same as verifying that it is assignable with
+ elements of a given type. For example, using the arrays created
+ above:
+
+ Exceptions that occur in the Java VM and that escape to C++ are
+ reported as a javaError
C++ exception. Failure to
+ handle the exception causes the process to crash.
+
+ Exceptions that occur in the Java VM and that escape to the Python
+ VM are reported with a JavaError
python exception
+ object. The getJavaException()
method can be called
+ on JavaError
objects to obtain the original java
+ exception object wrapped as any other Java object. This Java object
+ can be used to obtain a Java stack trace for the error, for example.
+
+ Exceptions that occur in the Python VM and that escape to the Java
+ VM, as for example can happen in Python extensions (see topic below)
+ are reported to the Java VM as a RuntimeException
or as
+ a PythonException
when using shared
+ mode. See installation
+ instructions for more information about shared mode.
+
+ JCC makes it relatively easy to extend a Java class from
+ Python. This is done via an intermediary class written in Java that
+ implements a special method called pythonExtension()
+ and that declares a number of native methods that are to be
+ implemented by the actual Python extension.
+
+ When JCC sees these special extension java classes it generates the + C++ code implementing the native methods they declare. These native + methods call the corresponding Python method implementations passing + in parameters and returning the result to the Java VM caller. +
++ For example, to implement a Lucene analyzer in Python, one would + implement first such an extension class in Java: +
+ +
+ The pythonExtension()
methods is what makes this class
+ recognized as an extension class by JCC. They should be included
+ verbatim as above along with the declaration of
+ the pythonObject
instance variable.
+
+ The implementation of the native pythonDecRef()
method
+ is generated by JCC and is necessary because it seems
+ that finalize()
cannot itself be native. Since an
+ extension class wraps the Python instance object it's going to be
+ calling methods on, its ref count needs to be decremented when this
+ Java wrapper class disappears. A declaration
+ for pythonDecRef()
and a finalize()
+ implementation should always be included verbatim as above.
+
+ Really, the only non boilerplate user input is the constructor of the
+ class and the other native methods, tokenStream()
in
+ the example above.
+
+ The corresponding Python class(es) are implemented as follows: +
+ +
+ When an __init__()
is declared, super()
+ must be called or else the Java wrapper class will not know about
+ the Python instance it needs to invoke.
+
+ When a java extension class declares native methods for which there
+ are public or protected equivalents available on the parent class,
+ JCC generates code that makes it possible to
+ call super()
on these methods from Python as well.
+
+ There are a number of extension examples available in PyLucene's test + suite + and samples. +
++ When generating wrappers for Python, JCC attempts to detect which + classes can be made iterable: +
+iterator()
+ with no arguments returning a type compatible
+ with java.util.Iterator
, this class is made iterable
+ from Python.
+ next()
+ with no arguments returning an object type, this class is made
+ iterable. Its next()
method is assumed to terminate
+ iteration by returning null
.
+
+ JCC generates a Python mapping get method for a class when requested
+ to do so via the --mapping
command line option which
+ takes two arguments, the class to generate the mapping get for and
+ the Java method to use. The method is specified with its name
+ followed by ':' and its Java
+ signature.
+
+ For example, System.getProperties()['java.class.path']
is
+ made possible by:
+
+ JCC generates Python sequence length and get methods for a class
+ when requested to do so via the --sequence
command line
+ option which takes three arguments, the class to generate the
+ sequence length and get for and the two java methods to use. The
+ methods are specified with their name followed by ':' and their Java
+ signature. For example:
+
+ is made possible by: +
+ ++ JCC is a C++ code generator that produces a C++ object interface + wrapping a Java library via Java's Native Interface (JNI). JCC + also generates C++ wrappers that conform to Python's C type system + making the instances of Java classes directly available to a Python + interpreter. +
+
+ When generating Python wrappers, JCC produces a complete Python
+ extension via the distutils
+ or setuptools
+ packages.
+
+ See here for more + information and documentation about JCC. +
++ JCC is supported on Mac OS X, Linux, Solaris and Windows. +
+
+ JCC requires Python version 2.x (x >= 3.5) and Java version 1.x
+ (x >= 4). Building JCC requires a C++ compiler. Use of
+ setuptools
+ is recommended.
+
+ See the installation + instructions for more information about building JCC from sources. +
++ The source code to JCC is part of PyLucene's and can be obtained with + a subversion client + from here. +
++ If you'd like to contribute to JCC or are having issues or questions + with JCC, please subscribe to the PyLucene developer mailing list. +
+