ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Goncharuk <alexey.goncha...@gmail.com>
Subject Re: Default hash code generation strategy for new binary objects
Date Thu, 29 Sep 2016 15:39:19 GMT
Folks, let me point out a few obvious (or not) things

A set of fields participating in hashCode and equals is impossible to
change without cluster restart. Imagine a new client adding a field F to
key structure (A, B) so that new key is (A, B, F). In this case key (1, 2,
0) will be treated as a different key w/respect to (1, 2, 3), while the new
client expects them to be the same. I think the set of fields defining the
key _must_ be used in both hash code and equals calculation and cannot
change over time. Having said that, we can use binary array comparison as
object equality test. Presence of new 'garbage' fields in key looks useless
to me.

So, we should support:
1) All the things Dmitriy pointed out, in other words - multi-field key
with affinity key fields
2) Ability to instantiate the key using a text-only environment
3) Ability to define the structure of a key using XML or DDL

I think all of these can be done via combination of HashCodeResolver
interface + default resolver which can read type configuration.

2016-09-29 3:54 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:

> Guys,
>
> We need to look at 3 cases:
>
> a) key is just one field
> b) key is multiple fields
> c) key is one or multiple fields, with possibility of an alternate affinity
> key
>
> For (a) and (b), whenever a type is defined in XML, and further in DML, a
> user will specify which fields are part of the key. In that case, we should
> just grab those fields and calculate the hashcode automatically.
>
> For (c), a user should specify in XML, and further in DML, which fields
> should be used for affinity, in addition to the key fields. In that case,
> we again should grab those fields and calculate the hashcodes for the
> primary key and the affinity key.
>
> I really am not sure if there are other ways of doing it. Am I missing
> something?
>
> D.
>
>
> On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dmagda@gridgain.com> wrote:
>
> > Let me show the picture I have in my mind:
> >
> > Primary key is a must for all INSERT and MERGE operations. If it’s not
> set
> > then an INSERT/MERGE fails.
> > If a primary key is a boxed/unboxed primitive (int, Integer, String,
> Date,
> > etc.) then the key value is used for hashCode calculation. At the same
> time
> > the key will be an affinity key.
> > If a primary key is a custom object then it’s value can be passed as a
> > param directly from Java, .Net, C++ in a way like “INSERT (_key, field1,
> > field2) VALUES (?, val1, val2)”. In this scenario we will call hashCode
> > directly on the key's value. In addition, we will be able to get an
> > affinity key since we know key’s type class descriptor
> > (BinaryClassDescriptor).
> > If a primary key is still a custom key and we want to insert its value
> > from an SQL console, PHP, Tableu, etc. then we can’t pass the key’s value
> > as is. Here we’re trying to apply a workaround by listing key's fields in
> > INSERT/MERGE and the task is to properly re-construct the key on our side
> > using only specific fields.
> >
> > Is my understanding correct for all the bullets above?
> >
> > If so then, yes, I would agree that we need to list these fields in a
> > configuration and the default hash code resolver will use them as well.
> > Moreover, we have to pin point an affinity field. So, the question is
> what
> > the configuration we should use.
> >
> > Community, any other thoughts/ideas?
> >
> > —
> > Denis
> >
> > > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <
> > alexander.a.paschenko@gmail.com> wrote:
> > >
> > > Also MERGE.
> > >
> > > 2016-09-29 2:10 GMT+03:00 Denis Magda <dmagda@gridgain.com>:
> > >> You need a hash code only for INSERT operation, right?
> > >>
> > >> —
> > >> Denis
> > >>
> > >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko <
> > alexander.a.paschenko@gmail.com> wrote:
> > >>>
> > >>> But what if the user works from some kind of console and just types
> > >>> the queries as text in full and does not bind params via JDBC or
> > >>> something alike? What if there's no binary object? I don't see why
we
> > >>> should keep the user from usual cache gets in this case. I really
> like
> > >>> the idea of supplying the values of distinct fields, thus freeing the
> > >>> user of the need to mess with objects and builders, AND then just
> > >>> calculating hash code as suggested before - say, via explicitly
> > >>> listing participating fields in XML or by marking them with transient
> > >>> keyword or some annotation.
> > >>> Actually, I believe that's the only case when we need to generate any
> > >>> hash codes - when the class is present, we can just get hash code
> from
> > >>> its implementation of its method. When there's no class, we generate.
> > >>> And all that is solely for SQL. For the rest - just throw an
> exception
> > >>> when there's no hash code manually set for binary object. I don't see
> > >>> why we should try to generate anything when the user already is using
> > >>> Ignite in full, not just via limited interface of SQL.
> > >>>
> > >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dmagda@gridgain.com>:
> > >>>> Hmm, this is a good question.
> > >>>>
> > >>>> If a user doesn’t provide a _key when an INSERT is executed for
me
> it
> > means that he is not going to use the key later for cache.get/put,
> DELETE,
> > UPDATE and other possible operation simply because he doesn’t know how to
> > reconstruct the key back in his code. If he wants to use the primary key
> in
> > the rest of operations then he must provide it at INSERT time.
> > >>>>
> > >>>> Do we need this key only for a case when an object is being inserted
> > into a cache? If it’s so I would auto-generate a key using ‘long’ as a
> key
> > type. I do remember that we provided the auto-generation for Spark module
> > in a some way that may be useful here.
> > >>>>
> > >>>> —
> > >>>> Denis
> > >>>>
> > >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko <
> > alexander.a.paschenko@gmail.com> wrote:
> > >>>>>
> > >>>>> Denis,
> > >>>>>
> > >>>>> That's not what I was asking about.
> > >>>>> Currently DML implementation allows for dymanic instantiation
of
> > keys,
> > >>>>> in other words, user does not have to provide value for
> object-typed
> > >>>>> _key column - instead, he may supply just field values based
on
> which
> > >>>>> _key will be dynamically instantiated/binary built. And that's
the
> > >>>>> whole point of this discussion as I see it: what to do when
we've
> > >>>>> binary built classless key that we build ourselves inside SQL
> engine
> > >>>>> and don't know how to compute hash code for it?
> > >>>>>
> > >>>>> - Alex
> > >>>>>
> > >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dmagda@gridgain.com>:
> > >>>>>> Alexander,
> > >>>>>>
> > >>>>>> As I guess if we have a key without a class then it will
be
> > constructed using a BinaryBuilder instance and it’s user responsibility
> to
> > set the has code at the end with BinaryBuilder.hasCode method. Sure, all
> > this cases must be well-documented in both Java Doc API and Apache Ignite
> > documentation.
> > >>>>>>
> > >>>>>> —
> > >>>>>> Denis
> > >>>>>>
> > >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko <
> > alexander.a.paschenko@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> Dmitry, Denis,
> > >>>>>>>
> > >>>>>>> OK, but I think it's necessary to address also the
cases when
> > there's
> > >>>>>>> no actual class for the key, and its fields are simply
declared
> in
> > >>>>>>> XML. In this case, there are no fields to be marked
transient.
> > What do
> > >>>>>>> we do then? List transient fields in XML separately?
> > >>>>>>>
> > >>>>>>> - Alex
> > >>>>>>>
> > >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <
> dsetrakyan@apache.org
> > >:
> > >>>>>>>> Agree with Denis.
> > >>>>>>>>
> > >>>>>>>> - by default, all non-transient key fields should
participate in
> > the
> > >>>>>>>> hashcode generation
> > >>>>>>>> - when working on DDL, then the primary key fields
should
> > participate in
> > >>>>>>>> the hashcode
> > >>>>>>>> - we should add a resolver to override the default
behavior
> > (please
> > >>>>>>>> propose the interface in Jira)
> > >>>>>>>> - we should print out a warning, only once per
type, the the
> > hashcode
> > >>>>>>>> has been automatically generated based on which
fields and which
> > formula
> > >>>>>>>>
> > >>>>>>>> D.
> > >>>>>>>>
> > >>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <
> dmagda@gridgain.com>
> > wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Alexander,
> > >>>>>>>>>
> > >>>>>>>>> Vladimir’s proposal sounds reasonable to
me. However we must
> > keep in mind
> > >>>>>>>>> one important thing. Binary objects were designed
to address
> the
> > following
> > >>>>>>>>> disadvantages a regular serializer, like optimized
marshaller,
> > has:
> > >>>>>>>>> necessity to deserialize an object on a server
side every time
> > it’s needed.
> > >>>>>>>>> necessity to hold an object in both serialized
and deserialized
> > forms on
> > >>>>>>>>> the server node.
> > >>>>>>>>> necessity to restart the whole cluster each
time an object
> > version is
> > >>>>>>>>> changed (new field is added or an old one is
removed).
> > >>>>>>>>> If it will be needed to perform step 3 for
a default
> > implementation of the
> > >>>>>>>>> binary resolver just because the resolver has
to consider new
> > fields or
> > >>>>>>>>> ignore old ones then such an implementation
sucks. Overall, the
> > default
> > >>>>>>>>> implementation should use the reflection coming
over all the
> > fields a key
> > >>>>>>>>> has ignoring the ones that are marked with
“transient” keyword.
> > If a user
> > >>>>>>>>> wants to control the default resolver's logic
then he can label
> > all the
> > >>>>>>>>> fields that mustn’t be of a final has code
value with
> > “transient”. This has
> > >>>>>>>>> to be well-documented for sure.
> > >>>>>>>>>
> > >>>>>>>>> Makes sense?
> > >>>>>>>>>
> > >>>>>>>>> —
> > >>>>>>>>> Denis
> > >>>>>>>>>
> > >>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander
Paschenko <
> > >>>>>>>>> alexander.a.paschenko@gmail.com> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Hello Igniters,
> > >>>>>>>>>>
> > >>>>>>>>>> As DML support is near, it's critical that
we agree on how we
> > generate
> > >>>>>>>>>> hash codes for new keys in presence of
binary marshaller.
> > Actually,
> > >>>>>>>>>> this discussion isn't new - please see
its beginning here:
> > >>>>>>>>>>
> > >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
> > >>>>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
> > >>>>>>>>> at-the-same-partition-by-default-td8042.html
> > >>>>>>>>>>
> > >>>>>>>>>> Still, I'm creating this new thread to
make getting to the
> final
> > >>>>>>>>>> solution as simple and fast as possible.
> > >>>>>>>>>>
> > >>>>>>>>>> I remind everyone that the approach that
has got the least
> > critics was
> > >>>>>>>>>> the one proposed by Vladimir Ozerov:
> > >>>>>>>>>>
> > >>>>>>>>>> <quote>
> > >>>>>>>>>> I think we can do the following:
> > >>>>>>>>>> 1) Add "has hash code" flag as Denis suggested.
> > >>>>>>>>>> 2) If object without a hash code is put
to cache, throw an
> > exception.
> > >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
> > >>>>>>>>>> 4) Add default implementation which will
auto-generate hash
> > code. *Print
> > >>>>>>>>> a
> > >>>>>>>>>> warning when auto-generation occurs*, so
that user is aware
> > that he is
> > >>>>>>>>>> likely to have problems with normal GETs/PUTs.
> > >>>>>>>>>> 5) Add another implementation which will
use encoded string to
> > calculate
> > >>>>>>>>> a
> > >>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a}
* 31 +
> > {b}")*.
> > >>>>>>>>>> Originally proposed by Yakov some time
ago.
> > >>>>>>>>>> </quote>
> > >>>>>>>>>>
> > >>>>>>>>>> After that, Sergi suggested that instead
of a "formula" we
> keep
> > just a
> > >>>>>>>>>> list of the "fields" that participate in
hash code evaluation,
> > and
> > >>>>>>>>>> with that list, we simply calculate hash
code just like IDE
> > does -
> > >>>>>>>>>> with all its bit shifts and additions.
> > >>>>>>>>>>
> > >>>>>>>>>> I'm planning on settling down with this
combined Vlad-Sergi
> > approach.
> > >>>>>>>>>> Any objections?
> > >>>>>>>>>>
> > >>>>>>>>>> And an extra question I have: Vlad, you
suggest that we both
> > throw an
> > >>>>>>>>>> exception on cache code absence and that
we might generate it
> > as the
> > >>>>>>>>>> last resort. Do I understand you correctly
that you suggest
> > generating
> > >>>>>>>>>> random code only in context of SQL, but
throw exception for
> keys
> > >>>>>>>>>> without codes on ordinary put?
> > >>>>>>>>>>
> > >>>>>>>>>> And yes, built-in hash codes for JDK types
are supported as
> > well as
> > >>>>>>>>>> items 1-2 from Vlad's list (there's already
fixed issue of
> > IGNITE-3633
> > >>>>>>>>>> for the flag and its presence check).
> > >>>>>>>>>>
> > >>>>>>>>>> - Alex
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message