ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Setrakyan <dsetrak...@apache.org>
Subject Re: Default hash code generation strategy for new binary objects
Date Thu, 29 Sep 2016 16:43:58 GMT
On Thu, Sep 29, 2016 at 9:19 AM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> I want to step back a little bit. Why do we need to choose fields for
> hashCode at all? If all fields participate in equals, all of them should
> participate in hashCode as well. We already serialize a user key in order
> to get a value from cache, so we can use a hashCode based on binary object
> representation.
>

I actually tend to agree with you. If one of the fields in the key is the
affinity field, we could use in in the hashcode as well, then.

If anyone wants to override it, we should provide a hashcode resolver.


>
> 2016-09-29 19:13 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:
>
> > Alexander,
> >
> > How do you plan to annotate fields that participate in the hashcode
> > calculation? Can you add all the changes you plan to make to the
> > configuration in the ticket and post the link here?
> >
> > Also, we must make sure that hashcode fields do not change. I believe you
> > should have validation in the code on startup of a system or of a cache,
> > and throw an exception if it fails.
> >
> > D.
> >
> > On Thu, Sep 29, 2016 at 9:08 AM, Alexander Paschenko <
> > alexander.a.paschenko@gmail.com> wrote:
> >
> > > Thanks everyone!
> > >
> > > Denis, yes, that's what I meant, now we're on the same page :)
> > > However, I'm worried about the same things Alexey is, that is, I'm not
> > > sure how we can handle presence of key fields that don't participate
> > > in 'equals' evaluation. Hence I'm all up for keeping mechanism of
> > > comparison for equality as it is now. And simply enhance the
> > > configuration as follows:
> > >
> > > 1. Introduce new BinaryObjectHashCodeResolver interface with single
> > > method having following signature:
> > >
> > > int computeHashCode(BinaryObject binObj)
> > >
> > > Cache configuration will then get a new corresponding param - probably
> > > a class/class name to instantiate resolver from.
> > >
> > > 2. Introduce default resolver by enhancing SQL engine configuration (=
> > > QueryEntity) - namely, add to it the list of hash code fields and
> > > optional affinity key field.
> > >
> > >
> > >
> > > 2016-09-29 18:39 GMT+03:00 Alexey Goncharuk <
> alexey.goncharuk@gmail.com
> > >:
> > > > Folks, let me point out a few obvious (or not) things
> > > >
> > > > A set of fields participating in hashCode and equals is impossible to
> > > > change without cluster restart. Imagine a new client adding a field F
> > to
> > > > key structure (A, B) so that new key is (A, B, F). In this case key
> (1,
> > > 2,
> > > > 0) will be treated as a different key w/respect to (1, 2, 3), while
> the
> > > new
> > > > client expects them to be the same. I think the set of fields
> defining
> > > the
> > > > key _must_ be used in both hash code and equals calculation and
> cannot
> > > > change over time. Having said that, we can use binary array
> comparison
> > as
> > > > object equality test. Presence of new 'garbage' fields in key looks
> > > useless
> > > > to me.
> > > >
> > > > So, we should support:
> > > > 1) All the things Dmitriy pointed out, in other words - multi-field
> key
> > > > with affinity key fields
> > > > 2) Ability to instantiate the key using a text-only environment
> > > > 3) Ability to define the structure of a key using XML or DDL
> > > >
> > > > I think all of these can be done via combination of HashCodeResolver
> > > > interface + default resolver which can read type configuration.
> > > >
> > > > 2016-09-29 3:54 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:
> > > >
> > > >> Guys,
> > > >>
> > > >> We need to look at 3 cases:
> > > >>
> > > >> a) key is just one field
> > > >> b) key is multiple fields
> > > >> c) key is one or multiple fields, with possibility of an alternate
> > > affinity
> > > >> key
> > > >>
> > > >> For (a) and (b), whenever a type is defined in XML, and further in
> > DML,
> > > a
> > > >> user will specify which fields are part of the key. In that case,
we
> > > should
> > > >> just grab those fields and calculate the hashcode automatically.
> > > >>
> > > >> For (c), a user should specify in XML, and further in DML, which
> > fields
> > > >> should be used for affinity, in addition to the key fields. In that
> > > case,
> > > >> we again should grab those fields and calculate the hashcodes for
> the
> > > >> primary key and the affinity key.
> > > >>
> > > >> I really am not sure if there are other ways of doing it. Am I
> missing
> > > >> something?
> > > >>
> > > >> D.
> > > >>
> > > >>
> > > >> On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dmagda@gridgain.com>
> > > wrote:
> > > >>
> > > >> > Let me show the picture I have in my mind:
> > > >> >
> > > >> > Primary key is a must for all INSERT and MERGE operations. If
it’s
> > not
> > > >> set
> > > >> > then an INSERT/MERGE fails.
> > > >> > If a primary key is a boxed/unboxed primitive (int, Integer,
> String,
> > > >> Date,
> > > >> > etc.) then the key value is used for hashCode calculation. At
the
> > same
> > > >> time
> > > >> > the key will be an affinity key.
> > > >> > If a primary key is a custom object then it’s value can be
passed
> > as a
> > > >> > param directly from Java, .Net, C++ in a way like “INSERT (_key,
> > > field1,
> > > >> > field2) VALUES (?, val1, val2)”. In this scenario we will call
> > > hashCode
> > > >> > directly on the key's value. In addition, we will be able to
get
> an
> > > >> > affinity key since we know key’s type class descriptor
> > > >> > (BinaryClassDescriptor).
> > > >> > If a primary key is still a custom key and we want to insert
its
> > value
> > > >> > from an SQL console, PHP, Tableu, etc. then we can’t pass the
> key’s
> > > value
> > > >> > as is. Here we’re trying to apply a workaround by listing key's
> > > fields in
> > > >> > INSERT/MERGE and the task is to properly re-construct the key
on
> our
> > > side
> > > >> > using only specific fields.
> > > >> >
> > > >> > Is my understanding correct for all the bullets above?
> > > >> >
> > > >> > If so then, yes, I would agree that we need to list these fields
> in
> > a
> > > >> > configuration and the default hash code resolver will use them
as
> > > well.
> > > >> > Moreover, we have to pin point an affinity field. So, the question
> > is
> > > >> what
> > > >> > the configuration we should use.
> > > >> >
> > > >> > Community, any other thoughts/ideas?
> > > >> >
> > > >> > —
> > > >> > Denis
> > > >> >
> > > >> > > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko <
> > > >> > alexander.a.paschenko@gmail.com> wrote:
> > > >> > >
> > > >> > > Also MERGE.
> > > >> > >
> > > >> > > 2016-09-29 2:10 GMT+03:00 Denis Magda <dmagda@gridgain.com>:
> > > >> > >> You need a hash code only for INSERT operation, right?
> > > >> > >>
> > > >> > >> —
> > > >> > >> Denis
> > > >> > >>
> > > >> > >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko
<
> > > >> > alexander.a.paschenko@gmail.com> wrote:
> > > >> > >>>
> > > >> > >>> But what if the user works from some kind of console
and just
> > > types
> > > >> > >>> the queries as text in full and does not bind params
via JDBC
> or
> > > >> > >>> something alike? What if there's no binary object?
I don't see
> > > why we
> > > >> > >>> should keep the user from usual cache gets in this
case. I
> > really
> > > >> like
> > > >> > >>> the idea of supplying the values of distinct fields,
thus
> > freeing
> > > the
> > > >> > >>> user of the need to mess with objects and builders,
AND then
> > just
> > > >> > >>> calculating hash code as suggested before - say,
via
> explicitly
> > > >> > >>> listing participating fields in XML or by marking
them with
> > > transient
> > > >> > >>> keyword or some annotation.
> > > >> > >>> Actually, I believe that's the only case when we
need to
> > generate
> > > any
> > > >> > >>> hash codes - when the class is present, we can just
get hash
> > code
> > > >> from
> > > >> > >>> its implementation of its method. When there's no
class, we
> > > generate.
> > > >> > >>> And all that is solely for SQL. For the rest - just
throw an
> > > >> exception
> > > >> > >>> when there's no hash code manually set for binary
object. I
> > don't
> > > see
> > > >> > >>> why we should try to generate anything when the
user already
> is
> > > using
> > > >> > >>> Ignite in full, not just via limited interface of
SQL.
> > > >> > >>>
> > > >> > >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dmagda@gridgain.com>:
> > > >> > >>>> Hmm, this is a good question.
> > > >> > >>>>
> > > >> > >>>> If a user doesn’t provide a _key when an INSERT
is executed
> for
> > > me
> > > >> it
> > > >> > means that he is not going to use the key later for cache.get/put,
> > > >> DELETE,
> > > >> > UPDATE and other possible operation simply because he doesn’t
know
> > > how to
> > > >> > reconstruct the key back in his code. If he wants to use the
> primary
> > > key
> > > >> in
> > > >> > the rest of operations then he must provide it at INSERT time.
> > > >> > >>>>
> > > >> > >>>> Do we need this key only for a case when an
object is being
> > > inserted
> > > >> > into a cache? If it’s so I would auto-generate a key using
‘long’
> > as a
> > > >> key
> > > >> > type. I do remember that we provided the auto-generation for
Spark
> > > module
> > > >> > in a some way that may be useful here.
> > > >> > >>>>
> > > >> > >>>> —
> > > >> > >>>> Denis
> > > >> > >>>>
> > > >> > >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko
<
> > > >> > alexander.a.paschenko@gmail.com> wrote:
> > > >> > >>>>>
> > > >> > >>>>> Denis,
> > > >> > >>>>>
> > > >> > >>>>> That's not what I was asking about.
> > > >> > >>>>> Currently DML implementation allows for
dymanic
> instantiation
> > of
> > > >> > keys,
> > > >> > >>>>> in other words, user does not have to provide
value for
> > > >> object-typed
> > > >> > >>>>> _key column - instead, he may supply just
field values based
> > on
> > > >> which
> > > >> > >>>>> _key will be dynamically instantiated/binary
built. And
> that's
> > > the
> > > >> > >>>>> whole point of this discussion as I see
it: what to do when
> > > we've
> > > >> > >>>>> binary built classless key that we build
ourselves inside
> SQL
> > > >> engine
> > > >> > >>>>> and don't know how to compute hash code
for it?
> > > >> > >>>>>
> > > >> > >>>>> - Alex
> > > >> > >>>>>
> > > >> > >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dmagda@gridgain.com
> >:
> > > >> > >>>>>> Alexander,
> > > >> > >>>>>>
> > > >> > >>>>>> As I guess if we have a key without
a class then it will be
> > > >> > constructed using a BinaryBuilder instance and it’s user
> > > responsibility
> > > >> to
> > > >> > set the has code at the end with BinaryBuilder.hasCode method.
> Sure,
> > > all
> > > >> > this cases must be well-documented in both Java Doc API and Apache
> > > Ignite
> > > >> > documentation.
> > > >> > >>>>>>
> > > >> > >>>>>> —
> > > >> > >>>>>> Denis
> > > >> > >>>>>>
> > > >> > >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander
Paschenko <
> > > >> > alexander.a.paschenko@gmail.com> wrote:
> > > >> > >>>>>>>
> > > >> > >>>>>>> Dmitry, Denis,
> > > >> > >>>>>>>
> > > >> > >>>>>>> OK, but I think it's necessary to
address also the cases
> > when
> > > >> > there's
> > > >> > >>>>>>> no actual class for the key, and
its fields are simply
> > > declared
> > > >> in
> > > >> > >>>>>>> XML. In this case, there are no
fields to be marked
> > transient.
> > > >> > What do
> > > >> > >>>>>>> we do then? List transient fields
in XML separately?
> > > >> > >>>>>>>
> > > >> > >>>>>>> - Alex
> > > >> > >>>>>>>
> > > >> > >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy
Setrakyan <
> > > >> dsetrakyan@apache.org
> > > >> > >:
> > > >> > >>>>>>>> Agree with Denis.
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> - by default, all non-transient
key fields should
> > > participate in
> > > >> > the
> > > >> > >>>>>>>> hashcode generation
> > > >> > >>>>>>>> - when working on DDL, then
the primary key fields should
> > > >> > participate in
> > > >> > >>>>>>>> the hashcode
> > > >> > >>>>>>>> - we should add a resolver to
override the default
> behavior
> > > >> > (please
> > > >> > >>>>>>>> propose the interface in Jira)
> > > >> > >>>>>>>> - we should print out a warning,
only once per type, the
> > the
> > > >> > hashcode
> > > >> > >>>>>>>> has been automatically generated
based on which fields
> and
> > > which
> > > >> > formula
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> D.
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> On Tue, Sep 27, 2016 at 5:42
PM, Denis Magda <
> > > >> dmagda@gridgain.com>
> > > >> > wrote:
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>> Hi Alexander,
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> Vladimir’s proposal sounds
reasonable to me. However we
> > must
> > > >> > keep in mind
> > > >> > >>>>>>>>> one important thing. Binary
objects were designed to
> > address
> > > >> the
> > > >> > following
> > > >> > >>>>>>>>> disadvantages a regular
serializer, like optimized
> > > marshaller,
> > > >> > has:
> > > >> > >>>>>>>>> necessity to deserialize
an object on a server side
> every
> > > time
> > > >> > it’s needed.
> > > >> > >>>>>>>>> necessity to hold an object
in both serialized and
> > > deserialized
> > > >> > forms on
> > > >> > >>>>>>>>> the server node.
> > > >> > >>>>>>>>> necessity to restart the
whole cluster each time an
> object
> > > >> > version is
> > > >> > >>>>>>>>> changed (new field is added
or an old one is removed).
> > > >> > >>>>>>>>> If it will be needed to
perform step 3 for a default
> > > >> > implementation of the
> > > >> > >>>>>>>>> binary resolver just because
the resolver has to
> consider
> > > new
> > > >> > fields or
> > > >> > >>>>>>>>> ignore old ones then such
an implementation sucks.
> > Overall,
> > > the
> > > >> > default
> > > >> > >>>>>>>>> implementation should use
the reflection coming over all
> > the
> > > >> > fields a key
> > > >> > >>>>>>>>> has ignoring the ones that
are marked with “transient”
> > > keyword.
> > > >> > If a user
> > > >> > >>>>>>>>> wants to control the default
resolver's logic then he
> can
> > > label
> > > >> > all the
> > > >> > >>>>>>>>> fields that mustn’t be
of a final has code value with
> > > >> > “transient”. This has
> > > >> > >>>>>>>>> to be well-documented for
sure.
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> Makes sense?
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> —
> > > >> > >>>>>>>>> Denis
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>>> On Sep 26, 2016, at
12:40 PM, Alexander Paschenko <
> > > >> > >>>>>>>>> alexander.a.paschenko@gmail.com>
wrote:
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> Hello Igniters,
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> As DML support is near,
it's critical that we agree on
> > how
> > > we
> > > >> > generate
> > > >> > >>>>>>>>>> hash codes for new keys
in presence of binary
> marshaller.
> > > >> > Actually,
> > > >> > >>>>>>>>>> this discussion isn't
new - please see its beginning
> > here:
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
> > > >> > >>>>>>>>> com/All-BinaryObjects-created-
> > > by-BinaryObjectBuilder-stored-
> > > >> > >>>>>>>>> at-the-same-partition-by-default-td8042.html
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> Still, I'm creating
this new thread to make getting to
> > the
> > > >> final
> > > >> > >>>>>>>>>> solution as simple and
fast as possible.
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> I remind everyone that
the approach that has got the
> > least
> > > >> > critics was
> > > >> > >>>>>>>>>> the one proposed by
Vladimir Ozerov:
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> <quote>
> > > >> > >>>>>>>>>> I think we can do the
following:
> > > >> > >>>>>>>>>> 1) Add "has hash code"
flag as Denis suggested.
> > > >> > >>>>>>>>>> 2) If object without
a hash code is put to cache, throw
> > an
> > > >> > exception.
> > > >> > >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver
*interface.
> > > >> > >>>>>>>>>> 4) Add default implementation
which will auto-generate
> > hash
> > > >> > code. *Print
> > > >> > >>>>>>>>> a
> > > >> > >>>>>>>>>> warning when auto-generation
occurs*, so that user is
> > aware
> > > >> > that he is
> > > >> > >>>>>>>>>> likely to have problems
with normal GETs/PUTs.
> > > >> > >>>>>>>>>> 5) Add another implementation
which will use encoded
> > > string to
> > > >> > calculate
> > > >> > >>>>>>>>> a
> > > >> > >>>>>>>>>> hash code. E.g. *new
BinaryEqualsHashCodeResolver("{a}
> *
> > > 31 +
> > > >> > {b}")*.
> > > >> > >>>>>>>>>> Originally proposed
by Yakov some time ago.
> > > >> > >>>>>>>>>> </quote>
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> After that, Sergi suggested
that instead of a "formula"
> > we
> > > >> keep
> > > >> > just a
> > > >> > >>>>>>>>>> list of the "fields"
that participate in hash code
> > > evaluation,
> > > >> > and
> > > >> > >>>>>>>>>> with that list, we simply
calculate hash code just like
> > IDE
> > > >> > does -
> > > >> > >>>>>>>>>> with all its bit shifts
and additions.
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> I'm planning on settling
down with this combined
> > Vlad-Sergi
> > > >> > approach.
> > > >> > >>>>>>>>>> Any objections?
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> And an extra question
I have: Vlad, you suggest that we
> > > both
> > > >> > throw an
> > > >> > >>>>>>>>>> exception on cache code
absence and that we might
> > generate
> > > it
> > > >> > as the
> > > >> > >>>>>>>>>> last resort. Do I understand
you correctly that you
> > suggest
> > > >> > generating
> > > >> > >>>>>>>>>> random code only in
context of SQL, but throw exception
> > for
> > > >> keys
> > > >> > >>>>>>>>>> without codes on ordinary
put?
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> And yes, built-in hash
codes for JDK types are
> supported
> > as
> > > >> > well as
> > > >> > >>>>>>>>>> items 1-2 from Vlad's
list (there's already fixed issue
> > of
> > > >> > IGNITE-3633
> > > >> > >>>>>>>>>> for the flag and its
presence check).
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> - Alex
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>>
> > > >> > >>>>>>
> > > >> > >>>>
> > > >> > >>
> > > >> >
> > > >> >
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message