ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Paschenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-4011) Automatically compute hash codes for newly built binary objects
Date Thu, 06 Oct 2016 20:28:21 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553094#comment-15553094

Alexander Paschenko commented on IGNITE-4011:

All right, the first version of patch for this issue is being tested on TC, and therefore
it's time to describe the design that has ultimately been implemented and showcase the examples
of configuration.

h2. Preface

Here are the main ideas:

- Leave the design as simple and clean as possible.
- Make all configuration changes optional. The only users that will need to change anything
will be those who wish to use new DML features in binary mode, and only for keys without classes.
For those who don't care about DML or don't use binary keys, there'll be nothing to worry
- Make possible the cases where no additional coding will be needed from the user's side.

Of course, if there's anyone who wanted to use binary classless keys outside of DML context,
they also will benefit from this change.

h2. API changes

The only configuration/public API related class changed is {{CacheKeyConfiguration}}. It has
four fields added:

/** Key hashing mode. */
private BinaryKeyHashingMode binHashingMode;

/** Fields to build binary objects' hash code upon. */
private List<String> binHashCodeFields;

/** Class name for hash code resolver to automatically compute hash codes for newly built
binary objects. */
private String binHashCodeRslvrClsName;

h2. Hashing mode

The latter two params are meaningful only depending on the value of the first one, so let's
review it first. New enum has been introduced to control binary classless key hashing behavior
- namely, {{BinaryKeyHashingMode}}. It's declared as follows - I left javadocs intact so that
possible options are clear:

 * Mode of generating hash codes for keys created with {@link BinaryObjectBuilder}.
public enum BinaryKeyHashingMode {
     * Default (also legacy pre 1.8) mode. Use this mode if you use no SQL DML commands -
     * in other words, if you put data to cache NOT via SQL.
     * Effect from choosing this mode is identical to omitting mode settings from key configuration
at all.

     * Generate hash code based upon serialized representation of binary object fields - namely,
byte array constructed
     * by {@link BinaryObjectBuilder}. Use this mode if you are NOT planning to retrieve data
from cache via
     * ordinary cache methods like {@link IgniteCache#get(Object)}, {@link IgniteCache#getAll(Set)},
etc., or
     * if you don't have particular classes for keys neither on client nor on server - it's
an convenient way
     * to manipulate and retrieve binary data in cache only via full-scale SQL features
     * with as little additional configuration overhead as choosing this mode.

     * Generate hash code based upon on list of fields declared in {@link BinaryObjectBuilder}
     * (not in {@link BinaryObject} as hash code has to be computed <b>before</b>
{@link BinaryObject} is fully built) -
     * this mode requires that you set {@link CacheKeyConfiguration#binHashCodeFields} for
it to work.

     * Generate hash code arbitrarily based on {@link BinaryObjectBuilder} using specified
class implementing
     * {@link BinaryObjectHashCodeResolver}- this mode requires that you set
     * {@link CacheKeyConfiguration#binHashCodeRslvrClsName} for it to work.

h2. Hashing modes explained

So, there are four options, as it'd been discussed on dev list:
- don't change any behavior
- hash byte array of fields set in builder
- hash particular subset of fields in builder
- provide custom logic to hash field values in builder in arbitrary way

Dev list had also suggested that we introduce interface {{BinaryObjectHashCodeResolver}}.

However, in order to make this interface simple to understand and implement, its usage is
limited to the last two options - fields subset hashing and custom hashing (last 2 modes in
the above list), while byte array hashing works without using it (as byte array is not a part
of binary builder).

Let's focus on the latter two. Correct hashing is of little use without correct implementation
of {{equals}} - even if we manage to maintain uniqueness of hash codes, we have to have mechanism
of comparing objects for equality, or otherwise we won't be able to retrieve from the cache
what we've put there.

Current implementaion of {{equals}} in {{BinaryObjectExImpl}} is based on contents of the
arrays. Therefore, this behavior is unchanged for {{BYTES_HASH}} mode - if byte arrays of
obejcts are equal, then their portions that correspond to fields are the same as well.

As mentioned above, {{FIELDS_HASH}} and {{CUSTOM}} modes utilize {{BinaryObjectHashCodeResolver}}
for hashing and equality comparison.

h2. Resolver interface and implementation

This interface looks as follows:

package org.apache.ignite.binary;

import org.apache.ignite.internal.binary.BinaryObjectExImpl;

 * Method to compute hash codes for new binary objects.
public interface BinaryObjectHashCodeResolver {
     * @param builder Binary object builder.
     * @return Hash code value.
    public int hash(BinaryObjectBuilder builder);

     * Compare binary objects for equality in consistence with how hash code is computed.
     * @param o1 First object.
     * @param o2 Second object.
     * @return
    public boolean equals(BinaryObjectExImpl o1, BinaryObjectExImpl o2);

For {{FIELDS_HASH}}, configuration takes setting list of fields as param of {{CacheKeyConfiguration}}
- hash code resolver will be built based upon those. Therefore, this mode takes no additional

For {{CUSTOM}}, configuration takes setting list of fields as param of {{CacheKeyConfiguration}}.
This mode obliges user to implement {{BinaryObjectHashCodeResolver}} and specify class name
for implementation.

h2. Per mode configuration examples

h3. {{BYTES_HASH}}

<bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
    <!-- ...other properties... -->

    <property name="cacheKeyConfiguration">
            <bean class="org.apache.ignite.cache.CacheKeyConfiguration">
                <property name="typeName" value="bytes_hashed_type" />

                <property name="affKeyFieldName" value="someAffField" />

                <property name="binHashingMode" value="BYTES_HASH" />

No coding, no other settings - just set the mode, and you can do all your MERGEs and INSERTs.
However, doing {{get}} s will probably be perilous as you'll have to create your keys with
builder. This minimalistic configuration suits setups when the user wishes to interact with
some portion of data in cache solely via SQL.


<bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
    <!-- ...other properties... -->

    <property name="cacheKeyConfiguration">
            <bean class="org.apache.ignite.cache.CacheKeyConfiguration">
                <property name="typeName" value="fields_hashed_type" />

                <property name="affKeyFieldName" value="someAffField" />

                <property name="binHashingMode" value="FIELDS_HASH" />

                <property name="binHashCodeFields">

Aside from setting the mode, you have to list the fields to hash. Suits modes when client
node has classes and data nodes don't, while data gets to cache via SQL INSERT/MERGE.

h3. {{CUSTOM}}

<bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
    <!-- ...other properties... -->

    <property name="cacheKeyConfiguration">
            <bean class="org.apache.ignite.cache.CacheKeyConfiguration">
                <property name="typeName" value="CustomHashedBinaryType" />

                <property name="affKeyFieldName" value="someAffField" />

                <property name="binHashingMode" value="CUSTOM" />

                <property name="binHashCodeRslvrClsName" value="com.company.ignite.binary.SomeCustomHasher"

Aside from setting the mode, you have to implement {{BinaryObjectHashCodeResolver}} on specified
class. Suits modes when client node has classes and data nodes don't, while data gets to cache

h2. Existing key classes with {{FIELDS_HASH}} and {{CUSTOM}} hashing modes

There is an important aspect of binary object handling: what if we wish to perform a {{get}}
on cache that contains a key
- for which the class *is* present on client node
- and the class *is not* present on data nodes
- and key was put to cache not by calling {{put}} but by SQL INSERT or MERGE?

What then? In this case user's class already has {{hashCode}} and {{equals}} implemented but
we don't have classes on nodes, still {{get}} s obviously have to work. In this case, logic
of {{BinaryObjectHashCodeResolver}} should match that declared in key's class (which data
nodes don't have).

For the cases when {{hashCode}} / {{equals}} logic is trivial and generated by IDE, fields
based hashing and equality comparisons are sufficient - therefore, {{FIELDS_HASH}} works,
and the only thing to maintain is consistency of field lists in code of key class which data
nodes don't have *AND* config files on data nodes.

For the cases when {{hashCode}} / {{equals}} logic is not trivial, user will have to implement
custom {{BinaryObjectHashCodeResolver}} which will have to mimic the logic of key hashing/comparing
in the class.

Rationale behind this design is as follows:
- If the user does not care about automatic keys hashing (= does not use DML features), then
he or she is probably happy and does not want to configure or, God forbid, code anything.
All that works has to work without new coding/configuration.
- If the user wishes to hash binary classless keys automatically (from SQL INSERT/MERGE) *AND*
have key classes on client nodes (= perform {{get}} with key serialized by, say, {{IgniteBinary.toBinary(Object)}}
and *NOT* constructed with binary builder), he or she will have to maintain integrity between
hashing modes on client and server nodes. However, forcing the user to change the code of
existing classes does not seem right, so the only burden is re-configuring data nodes. (And,
optionally, writing custom resolver if original class is hashed/compared in some weird way).

h2. Any ways to avoid having to do anything at all?
Sure thing.
- Don't use DML.
- Don't use binary keys without classes. *(Everything written above affects only cases with
non trivial classless keys.)

> Automatically compute hash codes for newly built binary objects
> ---------------------------------------------------------------
>                 Key: IGNITE-4011
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4011
>             Project: Ignite
>          Issue Type: Task
>          Components: binary, cache
>            Reporter: Alexander Paschenko
>            Assignee: Alexander Paschenko
>             Fix For: 1.8
> For binary keys built automatically inside SQL engine during INSERT or MERGE, we need
to compute hash codes automatically because in this case the user does not interact with any
builders and can't set hash code explicitly.

This message was sent by Atlassian JIRA

View raw message