river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Firmstone <j...@zeus.net.au>
Subject Re: Moving River into the Semantic Web with Codebase Services & Bytecode Analysis services.
Date Wed, 16 Sep 2009 23:12:27 GMT
Some Implementation design thoughts on Security:

Security by Name space visibility and Trust within Package Class loader's?

If each package is segregated into its own class loader and all 
dependencies required by that package have been determined by Code base 
analysis, then visibility should be limited to the classes and methods 
discovered by the codebase server analysis and enforced at class loading 
time . 

A local namespace visibility policy (more fine grained than java 
security policies) , might contain a list of allowable system methods 
for code originating from untrusted entitites (even though the code base 
is trusted and the code has been analysed).  Any method signatures in 
the downloaded code that didn't appear in the list as allowable, would 
not be granted visibility, a default working set could be created for 
distribution with River, all disallowed methods are commented out.

Then in the worst cast of trust, where neither the code base or the 
origin of the code is trusted, the list of required dependencies and 
methods declared by the code base analysis are only allowed if  they are 
allowed locally.  So if a code base were to submit code with non 
disclosed methods, those methods would not be accessible to the 
untrusted code.  The dependency analysis information provided by the 
code base forms a contract between untrusted parties.

Consider the following:

   1. Code base A is trusted and has obtained it's code from another
      trusted entity (who ever uploaded the code to the code base server
      in the first place).
   2. Code base B is untrusted.
   3. Code base A is trusted and has obtained some code from Code base B
      which is untrusted.
   4. Trusted and Untrusted code will be loaded into separate class
      loaders by a client JVM.

Note: my reference to methods, include protected or public visibility, 
the terminology may be freely interchanged with fields that are public 
or protected also.

Code base A could bundle and sign the trusted code, and bundle without 
signing the untrusted code after analysis. (where bundle means splitting 
an existing jar into multiple jar's after analysis, one for each package).

The client would receive a dependency analysis report from Code base A, 
the client would restrict the visibility of the untrusted code to a 
subset of declared methods that are allowed.

Code base A, might later receive trusted code that is API compatible 
with that of the untrusted code, this would be discovered by analysis.  
 From then on, Code base A would be able to provide trusted code, to 
it's trusting clients when required.

This could lead to the desirable situation where a Client is receiving a 
marshalled object stream from an untrusted service or vice versa, both 
entities could obtain trusted byte code for unmarshalling from their own 
preferred trusted code bases, regardless of the source of the marshalled 
object stream.

In the worst case, code could be obtained from an untrusted code base, 
however that byte code would not be able to access any methods that had 
not been declared as required dependencies by the code base, the 
declared methods would also be vetted against the local security 
policy.  In the worst case the code would be available with degraded 
functionality, but will not violate the local security and namespace 
visibility policy, unpermitted methods would not be visible in the 
untrusted package's class loader.

However I've deliberately left out a scenario:

Interoperability between trusted and untrusted code?

What about untrusted application code interacting with trusted 
application code?  How does one restrict access for untrusted code?  Who 
is responsible for determining what methods should be accessible by 
default, for application packages?  The package might not exist in the 
local JVM at load time, it may be downloaded later.

The onus in this case would have to be placed upon the trusted 
application package distributor (as trusted by the code base) who may at 
their discretion, change what methods untrusted code can safely have 
access to.  Hence there will need to be a means for the code base to 
allow and provide name space visibility policies for application code 
also.  Determining trust is left to the client.  An unknown third party 
may become trusted by a client, if that party is trusted by a trusted 
code base.  A friend of a friend so to speak.

Perhaps trusted code should be limited to the codebase's declared 
visibility requirements as an additional precaution, assisting with 
analysis bug identification too.  Perhaps different namespace visibility 
policies could be developed for different trusted codebase 
entities/identities, I'm not sure if this is an essential requirement, 
however the implementation could be made extensible so as not to exclude 
the possibility.

One other point:

Class load time delays caused by bytecode verification; perhaps bytecode 
verification could be performed by the trusted code base, eliminating 
the need to verify remote code, improving load time response.  Local 
code is not verified at load time by default.  In this case an 
administrator would trust their code bases and would not under any 
circumstance allow bytecode to be utilised from untrusted sources.  But 
then with the New Verifier in Java SE 6 as a result of JSR202... perhaps 
verification time has been mitigated somewhat?

Anyone have any input or implementation suggestions?



Peter Firmstone wrote:
> Look forward to it mate,
> N.B. this line should read:
>   * Codebase surrogates, for objects originating from periodically
>     disconnected services for clients to obtain their bytecode (they 
> also require Refreshable References and
>     Xuid's)
> Cheers,
> Peter.
> Gregg Wonderly wrote:
>> Peter, I want to write up some questions and thoughts about this 
>> post, but can't do that right now, hopefully I can in a day or so.
>> Gregg Wonderly
>> Peter Firmstone wrote:
>>> I've had some more thoughts on Codebase services after spending time 
>>> researching & reflecting.
>>> Issues I'd like to see addressed or simplified using Codebase services:
>>>    * Codebase loss
>>>    * Codebase replication
>>>    * Codebase upgrades
>>>    * Codebase configuration
>>>    * Codebase surrogates, for objects originating from periodically
>>>      disconnected clients (they also require Refreshable References and
>>>      Xuid's)
>>>    * Bytecode Dependency Analysis & API signature identification, for
>>>      Package & Class Binary Compatiblity & ClassLoader Isolation
>>>    * Bytecode Static Security Analysis, repackaging & code signing.
>>> On the last issue I've had some thoughts about Code bases being able 
>>> to act as a trust mediator to receive, analyse, repackage, sign and 
>>> forward bytecode on behalf of clients.  The last two items above fit 
>>> into the category of Bytecode Analysis service responsibilities for 
>>> codebases.  Prior to loading class files, a client can have a trust 
>>> relationship with one or more preferred codebase providers.  A code 
>>> base provider also provides bytecode static analysis services for 
>>> security and binary compatibility purposes.
>>> I got thinking about this solution after reading about service proxy 
>>> circular code verification issues for disconnected clients that 
>>> project neuromancer exposed.  A surrogate security verifier as well 
>>> as a codebase surrogate.
>>> All this would be implemented with minimal changes to services and 
>>> clients configurations and no change to third party library code, 
>>> unlike my evolving objects framework proposals.
>>> After receiving a tip off from Michael Warres, Tim Blackman was 
>>> gracious enough to share learnings from his research on class loader 
>>> tree's.  Tim built a prototype system using message digests and was 
>>> considering implementing textual Class API signatures for 
>>> identifying compatibility between different class bytecode's.  Tim 
>>> considered the textual API signatures when he found independent 
>>> vendor compiler optimisations produced different bytecode, hence 
>>> different SHA-1 signatures, although they have identical and 
>>> compatible class API.  I thought about this further and realised 
>>> that Binary Compatiblity for class files and package change is far 
>>> more flexible than source code compatibility.  While Tim 
>>> concentrated on API compatibility for ensuring objects that should 
>>> be shared, could be, he found that groups of class files, based on 
>>> dependency analysis (this is where the replacement ClassDep code 
>>> came from), required their own ClassLoader's, hence there are a 
>>> significant number of class loader instances required for maximum 
>>> compatibility (without going into more detail).
>>> In essence, the solution I'm striving for, is to solve the problem 
>>> in a distributed world that OSGi solves in the JVM; segregation and 
>>> isolation of incompatibility while allowing compatible 
>>> implementations to cooperate.  However I want an implementation 
>>> without commitment to any particular container or module technology, 
>>> so as not to force container implementation choices on projects that 
>>> already have their specific container implementations.
>>> Rather than reinventing another container technology,  all jar files 
>>> a service's client requires, could be uploaded to codebase services, 
>>> just prior to service registration.  The codebase service could 
>>> analyse, repackage and sign the jar files into compatible bundles, 
>>> dynamic containers if you wish, one for each ClassLoader, where each 
>>> class loader represents a Package API group signature.
>>> Using the uploaded jar files, the codebase services could generate 
>>> and propagate analysis reports amongst themselves in a p2p fashion, 
>>> such that between them, they could determine the latest binary 
>>> compatible version of a package, such that the latest compatible 
>>> version would always be preferred.  Once the latest version is 
>>> identified, a codebase service can verify, with it's own analysis, 
>>> in order to confirm and report malicious or malfunctioning codebase 
>>> servers.  Newer versions of a Package, found to have broken Binary 
>>> Backward compatibility, would be kept in a separate ClassLoader as 
>>> determined by their API signature, thus incompatibility is 
>>> isolated.  There may be subgroups within a package, that could also 
>>> be shared between incompatible package versions to provide improved 
>>> class file and object sharing.
>>> Hence a client receiving bytecode, could choose to channel it 
>>> through one or more codebase servers that it has trust relationships 
>>> with.  A bytecode trust surrogate, the preferred codebase server 
>>> could retrieve required bytecode that it doesn't already posses via 
>>> lookup services of other codebase service locations.  The bytecode 
>>> recipient would retrieve analysis information detailing bytecode 
>>> implementation security concerns prior to loading any bytecode.  The 
>>> codebase server would not execute any untrusted bytecode itself, 
>>> only perform analysis using the ASM library, the aim would be that a 
>>> codebase server was as secure as possible, such that it can be 
>>> considered trustworthy and as impervious to attack as 
>>> possible(existing denial of service attack strategies require 
>>> consideration).  One could even perform tests on codebases, by 
>>> uploading deliberately malicious code and checking resulting 
>>> analysis reports, or by occasionally confirming the analysis reports 
>>> with other codebases or using a local codebase analysis processes.  
>>> Separation of concerns.
>>> Codebase Services would only be required to maintain a copy of the 
>>> evolution bloodline for the latest binary backward compatible 
>>> package.  A package fork or breaking of backward compatibility would 
>>> mean storing a copy of both of the latest divergent compatibility 
>>> signatures, again some unchanged class subgroups may be shared 
>>> between them.  Java Bytecode versions (compiler specific) would also 
>>> dictate which package version could be used safely in local JVM's.
>>> Clients of services will have to accept a certain amount of 
>>> downtime, once a particular instance of a package's classes are 
>>> loaded into a classloader, no other compatible implementations of 
>>> that package will be able to be loaded, this is only a problem for 
>>> long lived service client processes.  Object state will need to be 
>>> persisted while the JVM restarts and reloads new bytecode 
>>> (Serializable is also part of class API). This is due to the 
>>> inability of an existing ClassLoader to reload classes (java debug 
>>> excluded). Backward Binary compatibility doesn't necessarily infer 
>>> forward compatibility, classes and interfaces can add methods 
>>> without breaking compatibility with pre existing binaries, 
>>> visibility can become more visible, abstract methods can become non 
>>> abstract, even though some of these changes break source code 
>>> compatibility, old clients aren't aware of the new methods and don't 
>>> execute them.  For specifics see Chapter 13, Binary Compatibility of 
>>> the Java Language Specification, 3rd Edition, this is what I plan to 
>>> base the compatibility analysis upon.
>>> It would also be possible for services to utilise codebase servers 
>>> in their classpath.
>>> These issues I propose tackling are not simple obstacles, nor will 
>>> they be easy to implement, some issues may even be intractable, but 
>>> what the hell, who' with me?  That's why we got into this in the 
>>> first place isn't it?  The challenge!  Project Neuromancer 
>>> highlighted areas for improvement, if we address some of these, I 
>>> believe that River can become the much vaunted and dreamt of 
>>> semantic web.
>>> I want problems identified so solutions can be devised, lets see 
>>> objections & supporting logic or better ideas.
>>> Cheers,
>>> Peter.

View raw message