river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Firmstone <j...@zeus.net.au>
Subject Power and flexibility of Serialization is under exploited
Date Thu, 13 Oct 2011 01:57:36 GMT
Serialization has an undeserving bad reputation, perhaps caused by too 
many developers just adding implements Serializable and accepting the 
default serialized form in public API, then turning around and saying 
they won't support backward compatible Serialization.

In the implementation discussed below all objects are using just one 
public API class with static factory methods, to keep it simple for user 
developers.

I've been adding serialization to reference collections (just a bunch of 
wrapper classes that encapsulate any collection framework interface and 
perform the boilerplate of retrieving referents, wrapping them in 
references and removing enqueued references from those collections, 
allowing the choice of Weak, Soft, Strong references with identity, 
equals and comparable semantics.

All the following package private wrapper classes share a single 
serialized form at present:

ReferenceCollection
ReferenceList
ReferenceSet
ReferenceSortedSet
ReferenceNavigableSet
ReferenceQueue
ReferenceDeque
ReferenceBlockingQueue
ReferenceBlockingDeque

The serial form is generated using writeReplace, and it recreates the 
correct collection using ReadResolve.

Now because each wrapper class is only publicly visible to the client as 
a java collection framework (JCF) interface, the serialized form (also 
called a serialization proxy), rebuilds it using the standard public api 
factory class during de-serialization, based on the JCF interface it 
implemented.  So the remote end is free to use another implementation.

Now there's a readResolve bug worth mentioning here, with regard to 
circular references.  writeReplace replaces all original object 
instances with your serialized form object, but readResolve doesn't 
replace circular referenced objects during de serialization.  So if 
you're utilising readResolve to replace your serialized form, you'll end 
up with a mix of the serialized form object and your freshly constructed 
implementation object.  You'll get ClassCastExceptions etc...

Bob Lee, that's Crazy Bob from JSR330 and Google Guice, came up with the 
idea of having the serialization proxy and original objects share the 
same interface, then having all methods redirected to the newly built 
object upon de-serialization.

So to implement that, I've got an inheritance hierarchy for the 
serialization proxy, to separate each function:

SerializationOfReferenceCollection
                 |
ReadResolveFixCollectionCircularReferences
                 |
ReferenceCollectionRefreshAfterSerialization
                 |
ReferenceCollectionSerialData

Now right about now, you're probably saying 4 classes in an inheritance 
hierarchy is a bit heavy for serialization?

Well no, not when you consider: they serialize 9 classes, and of all 
those classes, only one, ReferenceCollection has to implement a final 
writeReplace method, while all have to implement a readObject method 
that throws an exception to prevent direct de-serialization.

So all the 9 classes are freed from the implementation of Serialization, 
it's now the responsibility of the 4 classes in the serialization proxy 
(Serialization builder pattern) inheritance hierarchy.

Function of each class in the inheritance hierarchy:

SerializationOfReferenceCollection is an abstract class with a static 
factory method.

ReadResolveFixCollectionCircularReferences implements all the JCF 
collection based interfaces and redirects their calls to the 
ReferenceCollection implementation built during de-serialization.

ReferenceCollectionRefreshAfterSerialization, updates all the References 
contained by the collection so they belong to the same garbage 
collection ReferenceQueue and creates new References for all referents.

ReferenceCollectionSerialData, contains the fields transferred during 
serialization and implements abstract methods for the super classes to 
"get" these fields.

Now the interesting part is, I'm considering having three different 
serialized form's, each with a different purpose, the client can choose 
from:

1. A Non serialization class, that prevents serialization, where a 
developer want's to prevent access to serialized state.

2. The default serial data.

3. Defensive copying of serial data, to prevent stolen references to 
internal state during de-serialization.

The choice between the three serial states can be left until runtime,
the recipient of these objects when serialized doesn't have a choice 
which serial form is used, only the creator of the original object does.

Items 1 and 3 would only be used in a local sense, where a client 
program might try to use serialization to gain access to internal 
implementation state.

Item 2 would be used in a genuine distributed environment, over a secure 
connection, where there is no point using defensive copy's.

I've only implemented Item 2 of course, I decided that while it is 
possible to do 1 and 3 as well to demonstrate just how flexible 
serialization can be, it wasn't warranted based on that alone.  It will 
be possible to do this at some point in future, or to change the serial 
form in a non compatible manner, by adding a new serial form class, 
while retaining the original, so that both the old and new serial forms 
can be de-serialized.

When you apply Object design principals of responsibility, even 
serialization can be flexible.

Serialized Form lock in, is the same as inappropriate use of public 
fields or other poor programming practices.  Note that there are times 
where standard rules don't apply like the use of public fields in 
Entry's, which is totally appropriate, just as accepting the standard 
serial form in package private classes is appropriate too.

Cheers,

Peter.


Mime
View raw message