river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Firmstone <j...@zeus.net.au>
Subject Re: Maven repository Entry was Re: Codebase service?
Date Tue, 25 May 2010 06:05:39 GMT
Hi Dennis,

Reasoning and hopefully the why's? below.

Dennis Reedy wrote:
> Hi Peter,
>
> I was hoping to take a step back for a second, perhaps its just me that seems to have
my head spinning of late on this list. I may have missed some things, but we've discussed
many issues over the past week:
>
> - How to advertise the DL jar(s) a service vends, allowing a client to download requisite
jars that allow the jars to be loaded from a local (trusted) location
>   
Yes, we can use an Entry, or as Chris pointed out, if we annotate 
MarshalledInstance's using a new Maven URL schema we can extract that 
info and make it available via MarshalledServiceItem (An abstract class 
that extends ServiceItem).

> - Given the capability above, the need for a codebase service may not be required
>   
Agreed
> - Conventions on how to develop River services, as it relates to jar naming, packaging
and what dependencies are between the various artifacts
> - How to possibly move forward with utilizing Maven repositories and the implied capabilities
of published artifacts
> - The development of a maven archetype to allow a developer to easily create a working
project in seconds
>   
Yes to all above.
> Your attention to detail and the documentation of how class loader interactions with
regards to security is great. I'd like to understand the requirements of what you have documented
below, the urge to refactor MarshalledInstance, and why the new class loader hierarchy needs
to be added to River.
>   

The urge to refactor MarshalledInstance is to allow the URL annotation 
to be requested directly and passed via StreamServiceRegistrar and 
combined with delayed unmarshalling of proxy's via 
MarshalledServiceItem, to allow the client to provision and provide an 
alternate CodeSource if need be.

StreamServiceRegistrar returns a ResultStream<ServiceItem> , so you have 
check with instanceof MarshalledServiceItem.

The new packaging Scheme can be applied to distributed objects also, 
provided we create an implementation of CodebaseAccessClassLoader 
(contributed by Gregg to replace RMIClassLoaderSPI) that performs or 
requests local Maven archive provisioning.

The new ClassLoader hierarchy is needed, to solve class identity (fully 
qualified runtime classname = class + ClassLoader), class visibility, 
isolation and versioning problems, that PreferredClassProvider partially 
solves.
> Perhaps I'm just missing some fundamental issues, but maybe we need to take some time
and determine the whys before the hows? Is this direction fundamental to the OSGi direction
that you're taking? If so, how does this impact non-OSGi based systems?
>   
The changes are OSGi agnostic, OSGi will live in the application space, 
so while they benefit OSGi, they are independent of it, so the same 
benefits will apply to other software and OSGi isn't required.

I realised that fundamentally OSGi uses ClassLoaders for isolating 
software into components, so implementation classes aren't exposed 
outside of their module, something which OSGi does very well, it also 
manages security concerns very well.  Something else I realised, OSGi's 
use of ClassLoaders is not optimum for distributed systems, there are 
difficulties determining the correct ClassLoader during deserialization. 
OSGi wasn't designed with Serialization in mind.  Distributed computing 
introduces another dimension, like going from 2D to 3D,  in OSGi, you 
only have one bundle version combination loaded (you can have many 
bundles of different versions but I believe typically only one of each 
unique bundle instance, you can have the same package version exported 
by differently versioned bundles). So how do you determine the correct 
ClassLoader during unmarshalling.  In River we may have many proxy's 
using the same jar version, however we don't want the proxy's 
implementation to get all tied up in the local application bundles, we'd 
be allowing the smart proxy to pollute the local application space, some 
parts of the local application could see the proxy implementation.

In our new ClassLoader tree, a smart proxy can have it's own personal 
ClassLoader, because the ContextClassLoader will be that of the proxy's 
during returning object deserialization, since it initiated the 
communication with the remote Service host.  The reason a clients 
parameter implementation cannot have it's own ClassLoader and must share 
with other clients that use the same codebase and version is that they 
have no link to the ClassLoader at the remote Service host, with ony the 
Codebase and Version to go by, since they didn't initiate the 
communication, there could otherwise be many ClassLoaders containing 
that codebase version, there not enough information to find it, the last 
thing I want to do is require the client have an identity or location to 
deal with that deserialization of parameters at the Service node.

Rather than take, "how you use OSGi" and apply it to River, I decided to 
understand why they solved their problems the way they did and learn 
from it.  It is a very good solution to the problem they've solved.  
However with our solution we can solve the deserialization issue for 
distributed applications utilising OSGi.

Currently River uses Permission grants based on ClassLoader, (so does 
OSGi), what I realised was I needed a finer grained Permission grant and 
having many ProtectionDomain's inside one ClassLoader is about as fine 
as you can get.  Only one ClassLoader is used for the API space for 
class identity reasons, to allow maximum sharing of API classes because 
you just can't control and coordinate someone else's JVM's ClassLoader 
visibility, without overcoming some serious trust issues (Simpler is 
better I don't even want to attempt to solve them!). There is however 
one compromise with my approach.

By loading all API classes into the same ClassLoader, we cannot have 
duplicate classes, so we must always load the latest API version, that 
must not break backward compatibility. If the backward compatibility 
constraints are hampering your design, it's simply better to deprecate a 
package and append a number to change the package name.  (Or create a 
completely new API jar)

org.some.thing
org.some.thing2

The reason we version packages is so we don't have to rename them when 
they break backward compatibility, this makes sense for implementations, 
but not API.  If your going to have long lived persistent objects they 
belong in the API space, if you don't need to persist your objects, why 
not have an interface and throwaway class implementations, this solves 
Serialization exposing class internal state and evolution.  Extend the 
interface if you wan't new methods.

If a JVM has been running a long time, a new API version may have been 
released, clients using the old API functionality only, won't be able to 
see or utilise the new functionality until we restart the jvm.  That is 
the compromise.  But I figure it's not too bad a compromise once API's 
have stabilised and go into longer development cycles.  I can handle 
having to restart my JVM once every 6 months.

I think Michael Warres got to the crux of the problem with his 
publication on ClassLoader issues, my interpretation of what he said, is 
perhaps java should tear apart the multiple ClassLoader concerns, of 
Security, Isolation and Identity and start again.  I've chosen what 
appears to me to be the best compromise based on Java ClassLoader's today.

So this new ClassLoader hierarchy should play nice with Maven, OSGi and 
other stuff too, because now the API is visible to everything below in 
the ClassLoader hierarchy, while the implementations below, don't expose 
themselves, instead, everything cooperates through the API.

OSGi can be used to synchronize ClassLoader visibility between two 
separate JVM's, however that still requires the implementer deal with 
deserialization issues, with our solution, we won't have to worry much 
about ClassLoader issues.  With Maven, we won't have to worry about lost 
codebases either.

Yep, it has been a bit of a head spin, needed your help to work out the 
details before I forgot them.

There is one more detail, I'd like to include in the jar archive: a list 
of permissions the jar needs.  I'd like to use the same format OSGi 
uses, because it's been done before, why be different.  This is to solve 
the: "what grants does it need?" Problem. So we can minimise permission 
grants.

One more step towards the net...
> Thanks
>
> Dennis
>
> On May 24, 2010, at 1034PM, Peter Firmstone wrote:
>
>   
>> Thanks Chris,
>>
>> Sound like it's time for some MarshalledInstance Refactoring?
>>
>> Perhaps a Maven (generic if possible) URL schema (with message digest support), we
need an annotation (or name convention) that indicates whether proxy's can share ClassLoader
& ProtectionDomain space, dictated by static variables and common Principals.
>>
>> A new constructor for MarshalledInstance that accepts an alternate URL too.
>>
>> ... and two new methods in MarshalledInstance:
>> Object get(ClassLoader cl, CodeSource[] cs, boolean verifyCodeBaseIntegrity);
>> URL[] getCodeSourceAnnotation();
>>
>> Then MarshalledServiceItem could include new methods:
>>
>> public URL[] getCodeSourceAnnotation();
>> public Object getService( CodeSource[] cs );
>> //If cs == null || cs missing a CodeSource use default URL.
>>
>> Note here that while unmarshalling has been delayed, I haven't relinquished control
of ClassLoaders or ProtectionDomains, eg the client can use OSGi, without dictating the Service
must also, none of the serialized instances from method returns will need to be deserialized
by OSGi, avoiding altogether the OSGi deserialization issue. 
>> The client application doesn't have to deal with these concerns directly, we could
write multiple ResultStreamFilters that can be chained, the filter that matches the URL schema
will unmarshall the service, the filter sequence will dictate the preferred unmarshalling.
 The filter responsible for successful unmarshalling would construct a new ServiceItem, that
isn't unmarshalled, the next unmarshalling filter would ignore it, allowing it to pass through.
 After it is unmarshalled another filter will check method constraints.
>>
>> Method Parameters that originate from client ClassLoaders will be unmarshalled in
the Application ClassLoader space on the Service implementation node, this is where things
get hairy if the Service API method parameters are non final, abstract or interfaces.  Any
class that belongs to a Service API jar will be safely loaded into the Jini Platform ClassLoader
space in it's own ProtectionDomain.  Client returned parameter classes however will need their
own ClassLoader's
>>
>> If the Service API is loaded into a Parent ClassLoader (Jini Platform ClassLoader)
at the Service implementation node and API parameters are extended, the client classes will
need their own ClassLoader space at the Service Implementation end, Since a service may serve
many clients, these ClassLoaders must be shared, based on identical CodeSource and Principals.
 The client classes will only be accessible via the Service API interfaces or classes (they
are abstracted).
>>
>> ANY CLIENT THAT IMPLEMENTS AN API Interface or extends an API parameter, will need
to make it's implementation package jar publicly available.  Like the proxy implementation,
it is free to change, however it should be versioned appropriately, like the proxy and have
it's own jar.  ( This is where the Java Package Version Spec comes in handy,  we can annotate
classes with Package version and local CodeSource).  The CodeSource might contain a file URL,
however it will contain the jar archive name (which is why Dennis want's to name packages
with their versions, which can't hurt!) and given the Package Version Spec, it will work for
OSGi bundles as well as Maven.  A client using an OSGi bundle must remember that all of the
implementing classes should be in the same bundle and the Service node and may not be utilising
OSGi, so shouldn't attempt to use any OSGi services in Service API parameter implementations.
>>
>> The version spec will identify compatiblity of classes, the closed compatible local
CodeSource may be used, otherwise a new ClassLoader will be used.  Each client will either
share all compatible CodeSource and Principals or have their own ClassLoader space.
>>
>> Greg, do you think we could use your service-client.jar for client parameter implementations
or would this cause confusion?
>>
>> Perhaps we should use:
>>
>> service-param.jar
>>
>> So to really round if off:
>>
>> Service Implementers must produce versioned manifest jar archives of:
>>
>>   Smart Proxy:
>>
>>   Implementation jar: service.jar (depends on service-api.jar)
>>   API jar:            service-api.jar
>>   Smart proxy jar:    service-proxy.jar (depends on service-api.jar)
>>   Selfish Smart proxy jar:  service-iproxy.jar (depends on
>>   service-api.jar)
>>
>>   Dumb Proxy:
>>
>>   Implementation jar: service.jar (depends on service-api.jar)
>>   API jar:            service-api.jar
>>
>>
>> Client Implementers must produce version manifest jar archives of:
>>
>>   Client Parameter extensions:   service-param.jar
>>
>> If you didn't guess correctly the Selfish Smart proxy jar is the one that proxy's
cannot share in the same ClassLoader and ProtectionDomain.
>>
>>
>> ClassLoader Structure (In addition to all your helpful comments on river-dev, thanks
also to Jim, Tim & Mike, planting the seed):
>>
>>              System ClassLoader
>>                      |
>>             Extension ClassLoader (incl jsk-policy.jar)
>>                      |
>>             Jini Platform ClassLoader (incl jsk-platform.jar, *-api.jar)
>>                      |
>>       _______________|__________________________________
>>      |                            |                     |
>> Application ClassLoader    Proxy ClassLoader's    Parameter Impl ClassLoader's
>> (Apps & Service Impl)      (Smart Proxy's)        (Remote client parameter classes)
>>
>>
>> Advise History:
>>
>> Jim:     Use common Interfaces and classes in Parent ClassLoaders
>> Tim:    Thanks for research on Dependency Tree and ClassLoader Tree's and guidance.
>> Mike:  Research paper on ClassLoader issues.
>>
>> Thanks & Praise worth mentioning:
>>
>> Bob Scheifler and others for Jini's strong Security foundation.
>> Bill Venners for the ServiceUI, it is truly innovative
>>
>> (hint: come back)
>>
>>
>> Christopher Dolan wrote:
>>     
>>> Isn't List<URL> already present in the MarshalledInstance?  Why repeat
>>> this as an Entry?  Wouldn't it be easier to just add a public accessor
>>> to deserialize the list of URLs from MarshalledInstance.locBytes?
>>>
>>> I apologize if this was already explained, but there's been a LOT of
>>> email to read on this list lately.
>>>
>>> Chris
>>>
>>> -----Original Message-----
>>> From: Dennis Reedy [mailto:dennis.reedy@gmail.com] Sent: Saturday, May 22, 2010
9:29 AM
>>> To: river-dev@incubator.apache.org
>>> Subject: Re: Maven repository Entry was Re: Codebase service?
>>>
>>> [CJD] ... <snip> ...
>>>
>>> I would just go with a 
>>> List<String> dlJars;
>>>
>>> With this you could provide support for retrieving the DL jar(s) for
>>> non-maven systems as well. If the dlJars property contains 1 element and
>>> is of the form groupId:artifactId:version:classifier, then maven
>>> resolution gets used. Otherwise the DL jars can be obtained using the
>>> codebase of the advertising service.
>>>
>>> For maven resolution, I think you'll also want to either provide support
>>> for parsing your maven settings.xml or include the repositories to go
>>> find the artifact if it's not present. If the artifact is retrieved from
>>> the repository it will have a message digest along side of it (with
>>> either a .sha1 or .md5 extension). That can be used to compare a locally
>>> computed digest HttpmdUtil.computeDigest() for updates. But that
>>> comparison really only needs to take place for snapshots, since by
>>> definition releases are considered immutable.
>>>
>>> IMO supporting transitive deps is a must have, without that we really
>>> dont get that far. A DL artifact may depend on another DL artifact, and
>>> that DL artifact may have deps as well. 
>>>
>>>
>>>
>>>  
>>>       
>
>
>   




Mime
View raw message