geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Blum <jb...@pivotal.io>
Subject Re: Proposal: GEODE-4367 - Return PDXInstance when Domain Object can't be found
Date Thu, 25 Jan 2018 19:07:19 GMT
I have always thought/wondered, why not just store the data in serialized
form always.  There are several reasons to do so...

1. Whenever data is transferred between client & server, between peers,
over the WAN, overflowed to disk or persisted to disk, it must be
serialized.
2. Naturally it follows that if the data is always stored in serialized
form, it cuts down on de/serialization overhead.
3. Additionally, there is no need for or reduces the flags and other
configuration settings to configure serialization making it simpler to
understand, simpler to use.
4. When using PDX, Apache Geode is immediately interoperable between
multiple language clients, primarily Java and .NET/C++, but even other
language clients, e.g. JavaScript, Ruby, etc, where JSON is serialized to
PDX.
5. PDX is queryable without deserialization. This is HUGE and maybe the
most important reason!



The last 2 points suggest that the default serialization format should be
PDX, and truthfully, I am not really opposed to that.  Although, there are
some problems with this.

A. PDX does not handle cyclic dependencies unlike Java Serialization.
However, Java Serialization has massive overhead and is not interoperable
with native language and other language clients (e.g. JavaScript).

B. PDX does not handle Deltas unlike DataSerialization.  However, even when
using Deltas with DataSerialization, you must deserialize the data to apply
the delta.  Quite frankly and ironically, PDX seems better suited to
handling Deltas than DataSerialization, and without deserializing.

So, I would double down on PDX and forget DataSerialization and Java
Serialization.  And by "forget", I mean that Apache Geode never "stores"
DataSerialized or Java Serialized bytes; only PDX!

Therefore, solve the cyclic dependency problem and introduce proper Delta
handling without deserialization.  Then, optimize it!  Make PDX the best
serialization option for Java, and specifically for Apache Geode.  With 1
serialization format to worry about there is less to maintain, less data to
convert if the user needs to switch.  Flexibility is not always a good
thing.  It is easier to build up than to build down if you know what I mean.

I have made PDX a first class citizen in *Spring Data for Apache Geode*, in
multiple key functional areas of the framework (e.g. Repositories) and dead
simple to use/enable (i.e. @EnablePdx).



Regarding .NET/C++.  Truthfully, I don't really buy the argument that
.NET/C++ users shouldn't have to write Java types.  If the data is always
kept serialized, then technically they shouldn't have to, but they are
already writing their Functions in Java.  Besides, it is not like every
type needs a Java type, only types that need to be deserialized, if at all.

If the application consists of both Java and .NET/C++ clients, and the Java
devs want to work with high-level Java types, then they don't really have a
choice.  However, we can keep the de/serialization overhead at the point of
access (e.g. in the Function, executed on a particular node, at the time of
access), to a minimum.

A simple API like...

JavaType object = pdxInstance.getObject(Class<?> type);

... would do the trick.

The type argument does not need to be the original type that the PDX type
meta-data was created from, either.  It could be a "projection".  The only
concern Apache Geode has is mapping PDX fields to an instance of
"JavaType", where PDX fields are mapped to writable "JavaType" properties
(perhaps using Reflection here).

If the JavaType does not contain a property matching a PDX field, no big
deal.  This is the basis for our versioned type handling anyhow
(adding/removing a field/property).  However, the inverse is a bit more an
interesting problem, the JavaType has a field/property that is not
currently stored in PDX.  Perhaps throw an error, or provide a default
value, or whatever.  That could be configurable.

Maybe, just maybe, a user has the ability to provide their own Converter,
with it's own custom behavior...

interface Converter<T> {

  T convert(PdxInstance pdxInstance);

}

class JavaTypeConverter extends Converter<JavaType> {

  JavaType convert(PdxInstance pdxInstance) { ... }

}

Then...

Converter<JavaType> javaTypeConverter = new JavaTypeConverter();
...
JavaType object = pdxInstance.getObject(javaTypeConverter);


*One final thought...*

Ultimately, I'd like to see Apache Geode introduce a common
framework/interface for serialization, so that different serialization
strategies, or "providers", could be introduced and used by our users based
on their preferences and/or application's needs.

Keep in mind, the users data might not just live in Apache Geode, which is
particularly true in an increasingly Microservices world.  Other
technologies (e.g. Messaging Buses/Queues) are not going to know PDX.  PDX
would be the default, enabled serialization strategy/provider for Apache
Geode, provided by Apache Geode OOTB. This maybe 1 reason to still support
Java Serialization, given it is a universal serialization format between
disparate technologies, but Apache Geode should never store Java Serialized
bytes, only PDX.



Anyway, if you are still with me (sorry about length, just dumping all my
thoughts over the past few years) take all this with a grain of salt (and
maybe a slice of lemon, ;-).I was just thinking out loud and long term, as
both (previously) an engineer on Apache Geode as well as a user.

Food for thought.

Regards,
John



On Thu, Jan 25, 2018 at 9:55 AM, Anilkumar Gingade <agingade@pivotal.io>
wrote:

> Internally, there is an option to override read-serialized flag (to true);
> the query engine and other components uses this to keep the data in
> serialized form and work with PdxInstance...
>
> public static void setPdxReadSerialized(Cache cache, boolean
> readSerialized);
>
> We had discussed, making this as a public api...so that any thread that can
> work on PdxInstance can take advantage of it...
>
> -Anil.
>
>
> On Thu, Jan 25, 2018 at 9:42 AM, Jacob Barrett <jbarrett@pivotal.io>
> wrote:
>
> > Bruce, the flag only applies to values serialized with PDX,
> > DataSerializable objects are not effected by this property.
> >
> > I think there is some real value here as a stop gap until we have a
> better
> > solution in Geode 2 where the user can have a per request context that
> > specifies what return type they would like. Consider the user that has an
> > existing application that uses domain objects but wants to deploy another
> > application that doesn't to the same Geode cluster. The only option is to
> > either have all PDX deserialize to domain objects or all returned as
> > PdxInstance. One of the two applications will not work without
> > modification. Changing the behavior described by Addison splits the
> > difference. If the application is, like it is by default, configure to
> > deserialize PDX to the domain object but the domain object is not
> deployed
> > it will now give back the PDX instance rather than failing.
> >
> > An explicit use case is user that has both a Java and .NET application.
> The
> > .NET application does not have any Java domain objects to deploy to the
> > server but does want to query or run server side functions. The Java
> > application has deployed the domain objects to the server and distributed
> > functions are written expecting those domain objects on the server. The
> > user would have to create Java domain objects for the .NET application or
> > modify their Java application to expect PdxInstance.
> >
> >
> > -Jake
> >
> >
> > On Thu, Jan 25, 2018 at 7:38 AM Bruce Schuchardt <bschuchardt@apache.org
> >
> > wrote:
> >
> > > +1
> > >
> > > I've found the current read-serialized property to be pretty useless.
> > >
> > > Having said that, what if the value isn't actually in serialized form
> in
> > > the local cache?  Is Geode supposed to serialize it & return it?  What
> > > if it isn't PDX-serialized?  Do we return a byte array?
> > >
> > >
> > > On 1/24/18 12:21 PM, Dan Smith wrote:
> > > > Is this really just a workaround for the fact that the
> read-serialized
> > > flag
> > > > applies to the whole cache? I can see that if you have mix of regions
> > > with
> > > > and without domain classes on the server you might want this feature.
> > Can
> > > > you provide some more background on your use case?
> > > >
> > > > IMO we should get rid of read-serialized in favor of APIs that let
> the
> > > user
> > > > decide whether they get a domain class or a PdxInstance.
> > > >
> > > > -Dan
> > > >
> > > > On Wed, Jan 24, 2018 at 9:58 AM, Galen O'Sullivan <
> > gosullivan@pivotal.io
> > > >
> > > > wrote:
> > > >
> > > >> Hi Addison,
> > > >>
> > > >> What kind of setup do you have that is causing you to have PDX
> > > serialized
> > > >> objects that cannot be deserialized? Do you have classes that are
> > > present
> > > >> on some servers but not others?
> > > >>
> > > >> This change would break the contract of region.get() , which is that
> > it
> > > >> returns a value of a type that can be put into the region.
> > > >>
> > > >> Returning something that isn't what the user is expecting to be in
> the
> > > >> region would require users to check for PdxInstances every time they
> > > get a
> > > >> value -- even though the type annotations say that you can't get a
> > > >> PdxInstance back (except for Region<Object,Object> ).
> > > >>
> > > >> I think it would be better to have a second API that allows users
to
> > get
> > > >> and put PdxInstance objects in regions. That way, if they want to
> > handle
> > > >> the exceptional case where they have a serialized object that can't
> be
> > > >> deserialized, they can catch the ClassNotFound exception and get the
> > > >> underlying PdxInstance.
> > > >>
> > > >> I do think that the possibility of a ClassNotFoundException should
> be
> > > >> documented in the Region API.
> > > >>
> > > >> Cheers,
> > > >> Galen
> > > >>
> > > >> On Tue, Jan 23, 2018 at 2:56 PM, Addison Huddy <ahuddy@pivotal.io>
> > > wrote:
> > > >>
> > > >>> Hi Geode Devs,
> > > >>>
> > > >>> I'm proposing the following change to how we handle deserialization
> > > when
> > > >>> Domain Objects can't be found and pdx-serialize=false.
> > > >>>
> > > >>> https://issues.apache.org/jira/browse/GEODE-4367
> > > >>>
> > > >>> Looking forward to the discussion.
> > > >>>
> > > >>> \ah
> > > >>>
> > >
> > >
> >
>



-- 
-John
john.blum10101 (skype)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message