ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Kuznetsov <stku...@gmail.com>
Subject Re: IGNITE-5655: Mixing binary string encodings in Ignite cluster
Date Wed, 06 Sep 2017 13:27:45 GMT
As of option #1, it's not so bad. Currently we've implemented global level
encoding switch, and this looks similar to DBMS: if server works with
certain encoding, then all clients should be configured to use the same
encoding for correct string processing.

Option #2 provokes a number of questions.

What are performance implications of such hidden binary reencoding?

Who will check for possible data loss on transparent reencoding (when
object walks between caches/fields with distinct encodings)?

How should we handle nested binary objects? On the one hand, they should be
reencoded in a way described by Vladimir. On the other hand, BinaryObject
is an independent entity, that can be serialized/deserialized freely, moved
between various data structures, etc. It will be frustrating for user to
find its binary state changed after storing in a grid, with possible data
corruption.


As far as I can see, we are trying to couple orthogonal APIs:
BinaryMarshaller, IgniteCache and SQL. BinaryMarshaller is
Java-datatype-driven, it creates 1-to-1 mapping between Java types and
their binary representations, and now we are trying to map two binary types
(STRING and ENCODED_STRING) to single String class. IgniteCache is much
more flexible API, than SQL, but it lacks encoded string datatype, that
exists in SQLs of some RDBMSs: `varchar(n) character set some_charset`.
It's not a popular idea, but many problems could be solved by adding such
type. Those IgniteCache API users who don't need it won't use it, but it
could become a bridge between SQL and BinaryMarshaller encoded-string types.

2017-09-06 10:32 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:

> What we tried to achieve is that several encoding could co-exist in a
> single cluster or even single cache. This would be great from UX
> perspective. However, from what Andrey wrote, I understand that this would
> be pretty hard to achieve as we rely heavily on similar binary
> representation of objects being compared. That said, while this could work
> for SQL with some adjustments, we will have severe problems with
> BinaryObject.equals().
>
> Let's think on how we can resolve this. I see two options:
> 1) Allow only single encoding in the whole cluster. Easy to implement, but
> very bad from usability perspective. Especially this would affect clients -
> client nodes, and what is worse, drivers and thin clients! They all would
> have to bother about which encoding to use. But may be we can share this
> information during handshake (as every client has a handshake).
>
> 2) Add custom eocnding flag/ID to object header if non-standard enconding
> appears somewhere inside the object (even in nested objects). This way, we
> will be able to re-create the object if needed if expected and actual
> encoding doesn't match. For example, consider we have two caches/tables
> with different encoding (not implemented in current iteration, but we may
> decide to implement per-cache encodings in future, as this any RDBMS
> support it). And then I decide to move object A from cache 1 with UTF-8
> encoding to cache 2 with Cp1251 encoding. In this case I will detect
> encoding mismatch through object header (or footer) and re-build it
> transparently for user.
>
> Second option is more preferable to me as a long-term solution, but would
> require =more efforts.
>
> Thoughts?
>
> --
Best regards,
  Andrey Kuznetsov.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message