lucene-pylucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r935465 - /lucene/pylucene/site/src/documentation/content/xdocs/jcc/documentation/readme.xml
Date Mon, 19 Apr 2010 06:46:29 GMT
Author: vajda
Date: Mon Apr 19 06:46:28 2010
New Revision: 935465

   - added section about embedding a Python VM in a Java VM


Modified: lucene/pylucene/site/src/documentation/content/xdocs/jcc/documentation/readme.xml
--- lucene/pylucene/site/src/documentation/content/xdocs/jcc/documentation/readme.xml (original)
+++ lucene/pylucene/site/src/documentation/content/xdocs/jcc/documentation/readme.xml Mon
Apr 19 06:46:28 2010
@@ -292,9 +292,11 @@
           Java VM to search for classes. Every Python extension produced by
           JCC exports a <code>CLASSPATH</code> variable that is hardcoded to
           the jar files that it was produced from. A copy of each jar file
-          is installed as a resources files along with the extension when
-          JCC is invoked with the <code>--install</code> command line
-          argument. For example: 
+          is installed as a resource file with the extension when JCC is
+          invoked with the <code>--install</code> command line argument. 
+	  This parameter is optional and defaults to the
+	  <code>CLASSPATH</code> string exported by the module
+          <code>initVM</code> is imported from.
             >>> import lucene
             >>> lucene.initVM(classpath=lucene.CLASSPATH)
@@ -304,10 +306,10 @@
           The initial amount of Java heap to start the Java VM with. This
           argument is a string that follows the same syntax as the
-          similar <code>-Xms</code> java command line argument. For example:

+          similar <code>-Xms</code> java command line argument.
             >>> import lucene
-            >>> lucene.initVM(lucene.CLASSPATH, initialheap='32m')
+            >>> lucene.initVM(initialheap='32m')
             >>> lucene.Runtime.getRuntime().totalMemory()
@@ -330,8 +332,7 @@
           startup rountine. These are passed through as-is. For example:
             >>> import lucene
-            >>> lucene.initVM(lucene.CLASSPATH,
-                              vmargs='-Xcheck:jni,-verbose:jni,-verbose:gc')
+            >>> lucene.initVM(vmargs='-Xcheck:jni,-verbose:jni,-verbose:gc')
@@ -421,10 +422,10 @@
 	Java 1.5 added support for parameterized types. JCC generates code
 	to heed type parameters unless the <code>--no-generics</code>
 	command line parameter is used. Java type parameterization is a
-	runtime feature. There is only one class used for all its
+	runtime feature. The same class is used for all its
 	parameterizations. Similarly, JCC wrapper objects all use the same
 	class but store type parameterizations on instances and make them
-	accessible as a tuple via the <code>parameters_</code> variable.
+	accessible as a tuple via the <code>parameters_</code> property.
 	For example, an <code>ArrayList&lt;Document&gt;</code> instance,
@@ -435,9 +436,9 @@
 	To allocate an instance of a generic Java class with specific type
 	parameters use the <code>of_()</code> method. This method accepts
-	one or more Python classes to use as type parameters. For
+	one or more Python wrapper classes to use as type parameters. For
 	example, <code>java.util.ArrayList&lt;E&gt;</code> is declared to
-	accept one type parameter. Its wrapper's <code>of_()</code>
+	accept one type parameter. Its wrapper's <code>of_()</code> method
 	hence accepts one parameter, a Python class, to use as type
 	parameter for the return type of its <code>get()</code> method, among
@@ -641,7 +642,7 @@
 	implement first such an extension class in Java:
-    package org.osafoundation.lucene.analysis;
+    package org.apache.pylucene.analysis;
     import org.apache.lucene.analysis.Analyzer;
     import org.apache.lucene.analysis.TokenStream;
@@ -699,23 +700,29 @@
         class _analyzer(PythonAnalyzer):
-            def tokenStream(self, fieldName, reader):
+            def tokenStream(_self, fieldName, reader):
                 class _tokenStream(PythonTokenStream):
-                    def __init__(self):
-                        super(_tokenStream, self).__init__()
-                        self.TOKENS = ["1", "2", "3", "4", "5"]
-                        self.INCREMENTS = [1, 2, 1, 0, 1]
-                        self.i = 0
-                    def next(self):
-                        if self.i == len(self.TOKENS):
-                            return None
-                        t = Token(self.TOKENS[self.i], self.i, self.i)
-                        t.setPositionIncrement(self.INCREMENTS[self.i])
-                        self.i += 1
-                        return t
-                    def reset(self):
+                    def __init__(self_):
+                        super(_tokenStream, self_).__init__()
+                        self_.TOKENS = ["1", "2", "3", "4", "5"]
+                        self_.INCREMENTS = [1, 2, 1, 0, 1]
+                        self_.i = 0
+                        self_.posIncrAtt = self_.addAttribute(PositionIncrementAttribute.class_)
+                        self_.termAtt = self_.addAttribute(TermAttribute.class_)
+                        self_.offsetAtt = self_.addAttribute(OffsetAttribute.class_)
+                    def incrementToken(self_):
+                        if self_.i == len(self_.TOKENS):
+                            return False
+                        self_.termAtt.setTermBuffer(self_.TOKENS[self_.i])
+                        self_.offsetAtt.setOffset(self_.i, self_.i)
+                        self_.posIncrAtt.setPositionIncrement(self_.INCREMENTS[self_.i])
+                        self_.i += 1
+                        return True
+                    def end(self_):
-                    def close(self):
+                    def reset(self_):
+                        pass
+                    def close(self_):
                 return _tokenStream()
@@ -736,6 +743,78 @@
 	and <a href="site:documentation/readme">samples</a>.
+    <section id="embedding">
+      <title>Embedding a Python VM in a Java VM</title>
+      <p>
+	Using the same techniques used when writing a Python extension of a
+	Java class, JCC may also be used to embed a Python VM in a Java VM.
+	Following are the steps and constraints to follow to achieve this:
+      </p>
+      <ul>
+	<li>
+	  JCC must be built in shared mode.
+	  See <a href="site:jcc/documentation/install">installation
+	    instructions</a> for more information about shared mode.
+	</li>
+	<li>
+	  As described in the previous section, define one or more Java
+	  classes to be "extended" from Python to provide the
+	  implementations of the native methods declared on them. Instances
+	  of these classes implement the bridges into the Python VM from
+	  Java.
+	</li>
+	<li>
+	  The <code>org.apache.jcc.PythonVM</code> Java class is going be
+	  used from the Java VM's main thread to initialize the embedded
+	  Python VM. This class is installed inside the JCC egg under the
+	  <code>jcc/classes</code> directory and the full path to this
+	  directory must be on the Java <code>CLASSPATH</code>.
+	</li>
+	<li>
+	  The JCC egg directory contains the JCC shared runtime library - not
+	  the JCC Python extension shared library - but a library
+	  called <code>libjcc.dylib</code> on Mac OS X, 
+          <code></code> on Linux or <code>jcc.dll</code>
on Windows. 
+	  This directory must be added to the Java VM's shared library path
+	  via the <code>-Djava.library.path</code> command line parameter.
+	</li>
+	<li>
+	  In the Java VM's main thread, initialize the Python VM by calling
+	  its static <code>start()</code> method passing it a Python program
+	  name string and optional start-up arguments in a string array that
+	  will be made accessible in Python via <code>sys.argv</code>.
+	  This method returns the singleton PythonVM instance to be used in
+	  this Java VM. This instance may be retrieved at any later time via
+	  the static <code>get()</code> method defined on the
+          <code>org.apache.jcc.PythonVM</code> class. 
+	</li>
+	<li>
+	  Any Java VM thread that is going to be calling into the Python VM
+	  should start with acquiring a reference to the Python thread state
+	  object by calling <code>acquireThreadState()</code> method on the
+	  Python VM instance. It should then release the Python thread state
+	  before terminating by calling <code>releaseThreadState()</code>. 
+          Calling these methods is optional but strongly recommended as it
+	  ensures that Python is not creating and throwing away a thread
+	  state everytime the Python VM is entered and exited from a given
+	  Java VM thread.
+	</li>
+	<li>
+	  Any Java VM thread may instantiate a Python object for which an
+	  extension class was defined in Java as described in the previous
+	  section by calling the <code>instantiate()</code> method on the 
+	  PythonVM instance. This method takes two string parameters, the
+	  name of the Python module and the name of the Python class to
+	  import and instantiate from it. The <code>__init__()</code>
+	  constructor on this class must be callable without any parameters
+	  and, if defined, must call <code>super()</code> in order to
+	  initialize the Java side. The <code>instantiate()</code> method is
+	  declared to return <code>java.lang.Object</code> but the return
+	  value is actually an instance of the Java extension class used and
+	  must be downcast to it.
+	</li>
+      </ul>
+    </section>
     <section id="python">
       <title>Pythonic protocols</title>

View raw message