lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <>
Subject [jira] Commented: (LUCENE-1336) Distributed Lucene using Hadoop RPC based RMI with dynamic classloading
Date Sat, 26 Jul 2008 15:14:31 GMT


Jason Rutherglen commented on LUCENE-1336:

The classloading mechanism described above was also found to not be suitable.  This because
it would require a scan of all of the classes each time.  Because of inheritance, it is impossible
to accurately obtain all of the classes without a scan on each serialization.  This impacts
performance too much.

In working on this problem I found what I think is a design flaw in Java that would solve
many of the issues and that is not compiling a serialVersionUID into classes automatically
if they do not define one.  The current design creates inconsistencies during the deserialization
process with the ObjectInputStream.resolveClass(ObjectStreamClass desc) where the ObjectStreamClass
parameter returns a fake serialVersionUID that is inconsistent across VM implementations.
 Also because this serialVersionUID is only available from the ObjectStreamClass it makes
creating a map of classes and classes versions difficult.  

The solution which is easiest, most reliable and most efficient is to have a session based
classloading mechanism, where the session is between a client and the server.  The client
generates a unique session id every time the VM or in J2EE the webapp is loaded.  This mostly
guarantees the classes on the client will be consistent (it is the client's responsibility
restart the RMI object which generates a new session id if the client is dynamically loading
classes).  The server maintains a SessionClassLoader per client session that is used by the
deserialization code to dynamically load classes from the client.  The only limitation in
this solution is with the number SessionClassLoaders a server can support.  In most systems
it will not be factor.  The SessionClassLoaders on the server will simply expire from the
map after a period of not being used, rather than use remote referencing which would increase
network traffic unnecessarily.   

> Distributed Lucene using Hadoop RPC based RMI with dynamic classloading
> -----------------------------------------------------------------------
>                 Key: LUCENE-1336
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: lucene-1336.patch, lucene-1336.patch, lucene-1336.patch
> Hadoop RPC based RMI system for use with Lucene Searchable.  Keeps the application logic
on the client side with removing the need to deploy application logic to the Lucene servers.
 Removes the need to provision new code to potentially hundreds of servers for every application
logic change.  
> The use case is any deployment requiring Lucene on many servers.  This system provides
the added advantage of allowing custom Query and Filter classes (or other classes) to be defined
on for example a development machine and executed on the server without deploying the custom
classes to the servers first.  This can save a lot of time and effort in provisioning, restarting
processes.  In the future this patch will include an IndexWriterService interface which will
enable document indexing.  This will allow subclasses of Analyzer to be dynamically loaded
onto a server as documents are added by the client.
> Hadoop RPC is more scalable than Sun's RMI implementation because it uses non blocking
sockets.  Hadoop RPC is also far easier to understand and customize if needed as it is embodied
in 2 main class files org.apache.hadoop.ipc.Client and org.apache.hadoop.ipc.Server.  
> Features include automatic dynamic classloading.  The dynamic classloading enables newly
compiled client classes inheriting core objects such as Query or Filter to be used to query
the server without first deploying the code to the server.  
> Using RMI dynamic classloading is not used in practice because it is hard to setup, requiring
placing the new code in jar files on a web server on the client.  Then requires custom system
properties to be setup as well as Java security manager configuration.  
> The dynamic classloading in Hadoop RMI for Lucene uses RMI to load the classes.  Custom
serialization and deserialization manages the classes and the class versions on the server
and client side.  New class files are automatically detected and loaded using ClassLoader.getResourceAsStream
and so this system does not require creating a JAR file.  The use of the same networking system
used for the remote method invocation is used for the loading classes over the network.  This
removes the necessity of a separate web server dedicated to the task and makes deployment
a few lines of code.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message