lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <va...@apache.org>
Subject Re: call python from java - what strategy do you use?
Date Tue, 11 Jan 2011 19:13:06 GMT

  Hi Roman,

On Tue, 11 Jan 2011, Roman Chyla wrote:

> I have recently wrapped solr inside jetty with JCC (we need to access
> very big result sets quickly, via JNI, but also keep solr running as
> normal) and was wondering what strategies do you guys use to speak
> *from inside* Java towards the Python end.
>
> So far, I was able to think about these:
>
> - raise exceptions in java and catch in python (I think I have seen
> this in some posts from Bill Jansen)
> - communicate via sockets
> - wait passively - call some java method and wait for its return
> - monitor actively - in python check in loop some java object
>
> Is there something else?

I'm not sure I completely understand your questions but if what you're 
asking is how to run Python code from inside a Java servlet container, that
I've done with Tomcat and Lucene.

Basically, instead of embedding a JVM inside a Python VM - as is done for 
PyLucene - you do the opposite, you embed a Python VM inside a JVM.

For that purpose, see the org.apache.jcc.PythonVM class available in JCC's 
java tree. This class must be instantiated from the main thread at Java
servlet engine startup time. In Tomcat, I patched some startup code, in 
BootStrap.java (see patches below) for this purpose.

Then, to make some Python code accessible from Java, use the usual way of
writing "extensions", the so-called JCC in reverse trick. Define a Java class
with some native methods implemented in Python; define a Python class that
"extends" it; build the Java class into a JAR; include it into a JCC-built
egg; install the egg into Python's env (site-packages, PYTHONPATH, whatever);
Then, write servlet code in Java that imports your Java class and calls it.

As you can see, this sounds simple but the devil is in the details. Of course,
bending Jetty for this may have different requirements but the code snippets
below should give you a good idea about what's required.

This approach has been in production running the freebase.com's search server
for over two years now.

If you have questions, of course, please ask.
Good luck !

Andi..

----------------------
Patch to Bootstrap.java to use JCC's PythonVM (which initializes the embedded
Python VM)

--- apache-tomcat-6.0.29-src/java/org/apache/catalina/startup/Bootstrap.java	2010-07-19 06:02:32.000000000
-0700
+++ apache-tomcat-6.0.29-src/java/org/apache/catalina/startup/Bootstrap.java.patched	2010-08-04
08:49:05.000000000 -0700
@@ -30,16 +30,18 @@
  import javax.management.MBeanServer;
  import javax.management.MBeanServerFactory;
  import javax.management.ObjectName;

  import org.apache.catalina.security.SecurityClassLoad;
  import org.apache.juli.logging.Log;
  import org.apache.juli.logging.LogFactory;

+import org.apache.jcc.PythonVM;
+

  /**
   * Boostrap loader for Catalina.  This application constructs a class loader
   * for use in loading the Catalina internal classes (by accumulating all of the
   * JAR files found in the "server" directory under "catalina.home"), and
   * starts the regular execution of the container.  The purpose of this
   * roundabout approach is to keep the Catalina internal classes (and any
   * other classes they depend on, such as an XML parser) out of the system
@@ -398,22 +400,24 @@
          try {
              String command = "start";
              if (args.length > 0) {
                  command = args[args.length - 1];
              }

              if (command.equals("startd")) {
                  args[args.length - 1] = "start";
+                PythonVM.start("mql");
                  daemon.load(args);
                  daemon.start();
              } else if (command.equals("stopd")) {
                  args[args.length - 1] = "stop";
                  daemon.stop();
              } else if (command.equals("start")) {
+                PythonVM.start("mql");
                  daemon.setAwait(true);
                  daemon.load(args);
                  daemon.start();
              } else if (command.equals("stop")) {
                  daemon.stopServer(args);
              } else {
                  log.warn("Bootstrap: command \"" + command + "\" does not exist.");
              }

-----------------------------------------
Define a Java class:

package ....

public class EMQL {

     private long pythonObject;

     public EMQL()
     {
     }

     public void pythonExtension(long pythonObject)
     {
         this.pythonObject = pythonObject;
     }
     public long pythonExtension()
     {
         return this.pythonObject;
     }

     public void finalize()
         throws Throwable
     {
         pythonDecRef();
     }

     public native void pythonDecRef();

     // the methods implemented in python
     public native String init(ME me);
     public native String emql_refresh(String tid, String type);
     public native String emql_status();

     etc .......... etc

------------------------------------
The corresponding Python class

import ......

from jemql import initVM, CLASSPATH, EMQL

initVM(CLASSPATH)

class emql(EMQL):

     def __init__(self):
         super(emql, self).__init__()

     def init(self, me):
      ...........
     def emql_refresh(self, tid, type):
      ...........
     def emql_status(self):
      ...........
        return "some status"

     etc ...... etc

------------------------------------
Makefile rules to build this via JCC (the jemql.egg file is just an empty
target file for Makefile, it's not used for anything else):

default: jemql.egg

jemql.jar: java/org/blah/blah/EMQL.java
 	mkdir -p classes
 	javac -classpath $(CLASSPATH):$(MORE_CLASSPATH):$(etc..etc) -d classes $(JAVAC_FLAGS) $<
 	jar -cvf $@ -C classes .

jemql.egg: jemql.jar $(JMQL_JAR) emql.py
 	$(JCC) --version 1.0 --jar $< \
                --classpath $(CLASSPATH):$(JME_JAR):$(JMQL_JAR) \
                org.blah.blah.me.ME \
                --package java.lang \
                --python jemql --build $(DBG_FLAGS) \
                --install \
                --module emql
 	touch $@
------------------------------------
Patch to Tomcat's build.xml ANT script to add JCC's classes (like PythonVM) to
the build classpath.

--- apache-tomcat-6.0.29-src/build.xml	2010-07-19 06:02:31.000000000 -0700
+++ apache-tomcat-6.0.29-src/build.xml.patched	2010-08-04 09:30:24.000000000 -0700
@@ -95,16 +95,17 @@
    <property name="jasper-jdt.jar" value="${jasper-jdt.home}/jasper-jdt.jar"/>
    <available property="tomcat-dbcp.present" file="${tomcat-dbcp.jar}" />
    <available property="jdk16.present" classname="javax.sql.StatementEvent" />

    <!-- Classpath -->
    <path id="tomcat.classpath">
      <pathelement location="${ant.jar}"/>
      <pathelement location="${jdt.jar}"/>
+    <pathelement location="${jcc.egg}/jcc/classes"/>
    </path>

    <!-- Version info filter set -->
    <tstamp>
      <format property="TODAY" pattern="MMM d yyyy" locale="en"/>
      <format property="TSTAMP" pattern="hh:mm:ss"/>
    </tstamp>
    <filterset id="version.filters">
@@ -148,16 +149,25 @@
             excludes="**/CVS/**,**/.svn/**"
             encoding="ISO-8859-1">
  <!-- Comment this in to show unchecked warnings:
        <compilerarg value="-Xlint:unchecked"/>
   -->
        <classpath refid="tomcat.classpath" />
        <exclude name="org/apache/naming/factory/webservices/**" />
      </javac>
+    <javac srcdir="${extras.path}" destdir="${tomcat.classes}"
+           debug="${compile.debug}"
+           deprecation="${compile.deprecation}"
+           source="${compile.source}"
+           optimize="${compile.optimize}"
+           excludes="**/CVS/**,**/.svn/**">
+<!-- Comment this in to show unchecked warnings:     <compilerarg value="-Xlint:unchecked"/>
-->
+      <classpath refid="tomcat.classpath" />
+    </javac>
      <!-- Copy static resource files -->
      <copy todir="${tomcat.classes}" encoding="ISO-8859-1">
        <filterset refid="version.filters"/>
        <fileset dir="java">
          <include name="**/*.properties"/>
          <include name="**/*.dtd"/>
          <include name="**/*.tasks"/>
          <include name="**/*.xsd"/>

-----------------------------------------------
Patch to catalina.sh, the Tomcat startup script to add JCC to LIBPATH and
CLASSPATH

--- apache-tomcat-6.0.29-src/output/build/bin/catalina.sh	2010-08-04 09:57:27.000000000 -0700
+++ apache-tomcat-6.0.29-src/output/build/bin/catalina.sh.patched	2010-08-04 09:57:47.000000000
-0700
@@ -162,16 +162,30 @@
      exit 1
    fi
  fi

  if [ -z "$CATALINA_BASE" ] ; then
    CATALINA_BASE="$CATALINA_HOME"
  fi

+if [ -n "$JCC_EGG" ]; then
+  CLASSPATH="$CLASSPATH":"$JCC_EGG"/jcc/classes
+  JAVA_LIB_PATH=$JCC_EGG
+fi
+if [ -n "$TOMCAT_APR_LIB_PATH" ]; then
+  JAVA_LIB_PATH=$JAVA_LIB_PATH:$TOMCAT_APR_LIB_PATH
+fi
+if [ -n "$JAVA_LIB_PATH" ]; then
+  JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$JAVA_LIB_PATH"
+fi
+if [ -n "EXTRA_CLASSPATH" ]; then
+  CLASSPATH="$CLASSPATH":"$EXTRA_CLASSPATH"
+fi
+
  # Add tomcat-juli.jar and bootstrap.jar to classpath
  # tomcat-juli.jar can be over-ridden per instance
  if [ ! -z "$CLASSPATH" ] ; then
    CLASSPATH="$CLASSPATH":
  fi
  if [ "$CATALINA_BASE" != "$CATALINA_HOME" ] && [ -r "$CATALINA_BASE/bin/tomcat-juli.jar"
] ; then
    CLASSPATH="$CLASSPATH""$CATALINA_BASE"/bin/tomcat-juli.jar:"$CATALINA_HOME"/bin/bootstrap.jar
  else

These EGG paths are long, complicated and OS-specific, the trick below
generates them programmatically (from inside a Makefile):

JCC_EGG:=$(shell $(PYTHON) -c "import os, jcc; print os.path.dirname(os.path.dirname(jcc.__file__))")
JEMQL_EGG:=$(shell $(PYTHON) -c "import os, jemql; print os.path.dirname(os.path.dirname(jemql.__file__))")

Then, the CLASSPATH addition during _build_ time:
   CLASSPATH = $(CLASSPATH):$(JEMQL_EGG)/jemql/jemql.jar
and so on...
At runtime, JCC takes care of adding your eggs to the startup CLASSPATH.

----------------------------------------------
Last but not least, if you use Python's thread local storage in your threads, 
Python threads when embedded inside a JVM are 'dummy', that is, while they're
backed by the actual Java thread (a pthread), the Python VM is not managing
them and a thread state object is created each and every time a Python thread
is entered and released when exited back to the JVM. This has two problems:
  1. it's a bit wasteful
  2. python thread local storage gets lost

The Java class below works this around by incrementing the refcount that
controls this:

package org.apache.catalina.core;

import org.apache.jcc.PythonVM;

public class TerminatingThread extends Thread {
     protected Runnable runnable;

     public TerminatingThread(ThreadGroup group, Runnable runnable, String name)
     {
         super(group, name);
         this.runnable = runnable;
     }

     public void run()
     {
         PythonVM vm = PythonVM.get();

         try {
             vm.acquireThreadState();
             runnable.run();
         } finally {
             vm.releaseThreadState();
         }
     }
}

Then, there is some trickery to get Tomcat to use this class for its threads
instead of the default one:

--- apache-tomcat-6.0.29-src/java/org/apache/catalina/core/StandardThreadExecutor.java	2010-07-19
06:02:32.000000000 -0700
+++ apache-tomcat-6.0.29-src/java/org/apache/catalina/core/StandardThreadExecutor.java.patched
2010-08-04 08:56:02.000000000 -0700
@@ -44,17 +44,17 @@
      protected int minSpareThreads = 25;

      protected int maxIdleTime = 60000;

      protected ThreadPoolExecutor executor = null;

      protected String name;

-    private LifecycleSupport lifecycle = new LifecycleSupport(this);
+    protected LifecycleSupport lifecycle = new LifecycleSupport(this);
      // ---------------------------------------------- Constructors
      public StandardThreadExecutor() {
          //empty constructor for the digester
      }



      // ---------------------------------------------- Public Methods


In Tomcat's server.xml, use this executor (and code below for it)
     <Executor name="relThreadPool"
 	      className="org.apache.catalina.core.TerminatingThreadExecutor"
 	      namePrefix="rel-exec-"
 	      maxIdleTime="3600000"
               minSpareThreads="2"
 	      maxThreads="2" />


package org.apache.catalina.core;

import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import org.apache.catalina.LifecycleException;


public class TerminatingThreadExecutor extends StandardThreadExecutor {

     public void start()
         throws LifecycleException
     {
         lifecycle.fireLifecycleEvent(BEFORE_START_EVENT, null);

         TaskQueue taskqueue = new TaskQueue();
         TaskThreadFactory tf = new TerminatingTaskThreadFactory(namePrefix);

         lifecycle.fireLifecycleEvent(START_EVENT, null);
         executor = new ThreadPoolExecutor(getMinSpareThreads(), getMaxThreads(),
                                           maxIdleTime, TimeUnit.MILLISECONDS,
                                           taskqueue, tf);
         taskqueue.setParent(executor);
         lifecycle.fireLifecycleEvent(AFTER_START_EVENT, null);
     }

     protected class TerminatingTaskThreadFactory
         extends StandardThreadExecutor.TaskThreadFactory {

         protected TerminatingTaskThreadFactory(String namePrefix)
         {
             super(namePrefix);
         }

         public Thread newThread(Runnable runnable)
         {
             Thread t = new TerminatingThread(group, runnable, namePrefix + threadNumber.getAndIncrement());

             t.setDaemon(daemon);
             t.setPriority(getThreadPriority());

             return t;
         }
     }
}

Mime
View raw message