geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anilkumar Gingade <aging...@pivotal.io>
Subject Re: Proposal: GEODE-4367 - Return PDXInstance when Domain Object can't be found
Date Thu, 25 Jan 2018 19:59:18 GMT
+1 storing data in binary form (long time ask).




On Thu, Jan 25, 2018 at 11:07 AM, John Blum <jblum@pivotal.io> wrote:

> I have always thought/wondered, why not just store the data in serialized
> form always.  There are several reasons to do so...
>
> 1. Whenever data is transferred between client & server, between peers,
> over the WAN, overflowed to disk or persisted to disk, it must be
> serialized.
> 2. Naturally it follows that if the data is always stored in serialized
> form, it cuts down on de/serialization overhead.
> 3. Additionally, there is no need for or reduces the flags and other
> configuration settings to configure serialization making it simpler to
> understand, simpler to use.
> 4. When using PDX, Apache Geode is immediately interoperable between
> multiple language clients, primarily Java and .NET/C++, but even other
> language clients, e.g. JavaScript, Ruby, etc, where JSON is serialized to
> PDX.
> 5. PDX is queryable without deserialization. This is HUGE and maybe the
> most important reason!
>
>
>
> The last 2 points suggest that the default serialization format should be
> PDX, and truthfully, I am not really opposed to that.  Although, there are
> some problems with this.
>
> A. PDX does not handle cyclic dependencies unlike Java Serialization.
> However, Java Serialization has massive overhead and is not interoperable
> with native language and other language clients (e.g. JavaScript).
>
> B. PDX does not handle Deltas unlike DataSerialization.  However, even when
> using Deltas with DataSerialization, you must deserialize the data to apply
> the delta.  Quite frankly and ironically, PDX seems better suited to
> handling Deltas than DataSerialization, and without deserializing.
>
> So, I would double down on PDX and forget DataSerialization and Java
> Serialization.  And by "forget", I mean that Apache Geode never "stores"
> DataSerialized or Java Serialized bytes; only PDX!
>
> Therefore, solve the cyclic dependency problem and introduce proper Delta
> handling without deserialization.  Then, optimize it!  Make PDX the best
> serialization option for Java, and specifically for Apache Geode.  With 1
> serialization format to worry about there is less to maintain, less data to
> convert if the user needs to switch.  Flexibility is not always a good
> thing.  It is easier to build up than to build down if you know what I
> mean.
>
> I have made PDX a first class citizen in *Spring Data for Apache Geode*, in
> multiple key functional areas of the framework (e.g. Repositories) and dead
> simple to use/enable (i.e. @EnablePdx).
>
>
>
> Regarding .NET/C++.  Truthfully, I don't really buy the argument that
> .NET/C++ users shouldn't have to write Java types.  If the data is always
> kept serialized, then technically they shouldn't have to, but they are
> already writing their Functions in Java.  Besides, it is not like every
> type needs a Java type, only types that need to be deserialized, if at all.
>
> If the application consists of both Java and .NET/C++ clients, and the Java
> devs want to work with high-level Java types, then they don't really have a
> choice.  However, we can keep the de/serialization overhead at the point of
> access (e.g. in the Function, executed on a particular node, at the time of
> access), to a minimum.
>
> A simple API like...
>
> JavaType object = pdxInstance.getObject(Class<?> type);
>
> ... would do the trick.
>
> The type argument does not need to be the original type that the PDX type
> meta-data was created from, either.  It could be a "projection".  The only
> concern Apache Geode has is mapping PDX fields to an instance of
> "JavaType", where PDX fields are mapped to writable "JavaType" properties
> (perhaps using Reflection here).
>
> If the JavaType does not contain a property matching a PDX field, no big
> deal.  This is the basis for our versioned type handling anyhow
> (adding/removing a field/property).  However, the inverse is a bit more an
> interesting problem, the JavaType has a field/property that is not
> currently stored in PDX.  Perhaps throw an error, or provide a default
> value, or whatever.  That could be configurable.
>
> Maybe, just maybe, a user has the ability to provide their own Converter,
> with it's own custom behavior...
>
> interface Converter<T> {
>
>   T convert(PdxInstance pdxInstance);
>
> }
>
> class JavaTypeConverter extends Converter<JavaType> {
>
>   JavaType convert(PdxInstance pdxInstance) { ... }
>
> }
>
> Then...
>
> Converter<JavaType> javaTypeConverter = new JavaTypeConverter();
> ...
> JavaType object = pdxInstance.getObject(javaTypeConverter);
>
>
> *One final thought...*
>
> Ultimately, I'd like to see Apache Geode introduce a common
> framework/interface for serialization, so that different serialization
> strategies, or "providers", could be introduced and used by our users based
> on their preferences and/or application's needs.
>
> Keep in mind, the users data might not just live in Apache Geode, which is
> particularly true in an increasingly Microservices world.  Other
> technologies (e.g. Messaging Buses/Queues) are not going to know PDX.  PDX
> would be the default, enabled serialization strategy/provider for Apache
> Geode, provided by Apache Geode OOTB. This maybe 1 reason to still support
> Java Serialization, given it is a universal serialization format between
> disparate technologies, but Apache Geode should never store Java Serialized
> bytes, only PDX.
>
>
>
> Anyway, if you are still with me (sorry about length, just dumping all my
> thoughts over the past few years) take all this with a grain of salt (and
> maybe a slice of lemon, ;-).I was just thinking out loud and long term, as
> both (previously) an engineer on Apache Geode as well as a user.
>
> Food for thought.
>
> Regards,
> John
>
>
>
> On Thu, Jan 25, 2018 at 9:55 AM, Anilkumar Gingade <agingade@pivotal.io>
> wrote:
>
> > Internally, there is an option to override read-serialized flag (to
> true);
> > the query engine and other components uses this to keep the data in
> > serialized form and work with PdxInstance...
> >
> > public static void setPdxReadSerialized(Cache cache, boolean
> > readSerialized);
> >
> > We had discussed, making this as a public api...so that any thread that
> can
> > work on PdxInstance can take advantage of it...
> >
> > -Anil.
> >
> >
> > On Thu, Jan 25, 2018 at 9:42 AM, Jacob Barrett <jbarrett@pivotal.io>
> > wrote:
> >
> > > Bruce, the flag only applies to values serialized with PDX,
> > > DataSerializable objects are not effected by this property.
> > >
> > > I think there is some real value here as a stop gap until we have a
> > better
> > > solution in Geode 2 where the user can have a per request context that
> > > specifies what return type they would like. Consider the user that has
> an
> > > existing application that uses domain objects but wants to deploy
> another
> > > application that doesn't to the same Geode cluster. The only option is
> to
> > > either have all PDX deserialize to domain objects or all returned as
> > > PdxInstance. One of the two applications will not work without
> > > modification. Changing the behavior described by Addison splits the
> > > difference. If the application is, like it is by default, configure to
> > > deserialize PDX to the domain object but the domain object is not
> > deployed
> > > it will now give back the PDX instance rather than failing.
> > >
> > > An explicit use case is user that has both a Java and .NET application.
> > The
> > > .NET application does not have any Java domain objects to deploy to the
> > > server but does want to query or run server side functions. The Java
> > > application has deployed the domain objects to the server and
> distributed
> > > functions are written expecting those domain objects on the server. The
> > > user would have to create Java domain objects for the .NET application
> or
> > > modify their Java application to expect PdxInstance.
> > >
> > >
> > > -Jake
> > >
> > >
> > > On Thu, Jan 25, 2018 at 7:38 AM Bruce Schuchardt <
> bschuchardt@apache.org
> > >
> > > wrote:
> > >
> > > > +1
> > > >
> > > > I've found the current read-serialized property to be pretty useless.
> > > >
> > > > Having said that, what if the value isn't actually in serialized form
> > in
> > > > the local cache?  Is Geode supposed to serialize it & return it?
> What
> > > > if it isn't PDX-serialized?  Do we return a byte array?
> > > >
> > > >
> > > > On 1/24/18 12:21 PM, Dan Smith wrote:
> > > > > Is this really just a workaround for the fact that the
> > read-serialized
> > > > flag
> > > > > applies to the whole cache? I can see that if you have mix of
> regions
> > > > with
> > > > > and without domain classes on the server you might want this
> feature.
> > > Can
> > > > > you provide some more background on your use case?
> > > > >
> > > > > IMO we should get rid of read-serialized in favor of APIs that let
> > the
> > > > user
> > > > > decide whether they get a domain class or a PdxInstance.
> > > > >
> > > > > -Dan
> > > > >
> > > > > On Wed, Jan 24, 2018 at 9:58 AM, Galen O'Sullivan <
> > > gosullivan@pivotal.io
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Hi Addison,
> > > > >>
> > > > >> What kind of setup do you have that is causing you to have PDX
> > > > serialized
> > > > >> objects that cannot be deserialized? Do you have classes that
are
> > > > present
> > > > >> on some servers but not others?
> > > > >>
> > > > >> This change would break the contract of region.get() , which
is
> that
> > > it
> > > > >> returns a value of a type that can be put into the region.
> > > > >>
> > > > >> Returning something that isn't what the user is expecting to
be in
> > the
> > > > >> region would require users to check for PdxInstances every time
> they
> > > > get a
> > > > >> value -- even though the type annotations say that you can't
get a
> > > > >> PdxInstance back (except for Region<Object,Object> ).
> > > > >>
> > > > >> I think it would be better to have a second API that allows users
> to
> > > get
> > > > >> and put PdxInstance objects in regions. That way, if they want
to
> > > handle
> > > > >> the exceptional case where they have a serialized object that
> can't
> > be
> > > > >> deserialized, they can catch the ClassNotFound exception and
get
> the
> > > > >> underlying PdxInstance.
> > > > >>
> > > > >> I do think that the possibility of a ClassNotFoundException should
> > be
> > > > >> documented in the Region API.
> > > > >>
> > > > >> Cheers,
> > > > >> Galen
> > > > >>
> > > > >> On Tue, Jan 23, 2018 at 2:56 PM, Addison Huddy <ahuddy@pivotal.io
> >
> > > > wrote:
> > > > >>
> > > > >>> Hi Geode Devs,
> > > > >>>
> > > > >>> I'm proposing the following change to how we handle
> deserialization
> > > > when
> > > > >>> Domain Objects can't be found and pdx-serialize=false.
> > > > >>>
> > > > >>> https://issues.apache.org/jira/browse/GEODE-4367
> > > > >>>
> > > > >>> Looking forward to the discussion.
> > > > >>>
> > > > >>> \ah
> > > > >>>
> > > >
> > > >
> > >
> >
>
>
>
> --
> -John
> john.blum10101 (skype)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message