Mailing-List: contact river-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: river-dev@incubator.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Message-ID: <4B1C4D63.1050002@zeus.net.au>
Date: Mon, 07 Dec 2009 10:33:39 +1000
From: Peter Firmstone <jini@zeus.net.au>
Organization: Zeus Project Services
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
MIME-Version: 1.0
To: river-dev@incubator.apache.org
Subject: Distributed ExecutorService
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

I've had a few thoughts about the whole "move the code to the data" 
concept (or "Move the code to the Service node") for some time, 
considering it a low priority, I have kept quiet about it, until 
recently when the topic came up during a recent email discussion.

Current Practise for River applications is to move code and data around 
together in the form of marshalled objects.  Two particular groups of 
Objects are of interest, those that are process or code intensive where 
methods process and create returned results and data intensive objects 
where there is little to be done in the way of processing, where minor 
copy / transformations are performed on existing state.

I think that the River platform addresses these Object groups quite 
effectively when the processing is known at compile time or when the 
service requirements are clear.  However there are Occasions when it 
would be less network intensive or simpler to submit  the distributed 
equivalent of a  ScheduledTask or Runnable to consume an existing data 
intensive service at the origin of that service and make the desired 
result available via a temporary service or some other mechanism or 
protocol.  In cases where particular class files and libraries required 
to perform processing are available at the service node, but unavailable 
at the client due to a legacy java environment, no ability to load 
remote class files, or a constrained memory environment that cannot 
provide enough memory space for the processing required.  The result of 
the uploaded runnable class file can be transformed into a locally 
available or compatible class file.

The Runnable uploaded code might be uploaded to the service node, by the 
client or a third party mediator.  Any suggestions for what the 
mechanism should be would also be useful. I'm thinking that a signed 
OSGi bundle containing a set of permissions would be a good model to 
start from, considering that OSGi already has many of the Security 
mechanisms that would make such a thing possible.

In essence the DistributedScheduledTask is a remote piece of client code 
that is executed in the service node.  I'm wondering just what should a 
DistributedExecutorService provide, if anyone else has had thoughts 
similar to mine.

For instance, a Reporting Node in a cluster might send out the same 
DistributedScheduledTask to all available services of a particular type 
to perform some intensive data processing or filtering remotely at each 
node and retrieve the results from each after processing.  The Reporting 
Node might have changing reporting requirements similar to performing 
queries for instance.

Cheers,

Peter.