ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: Custom string encoding
Date Sun, 02 Jul 2017 06:53:01 GMT
Valya,

Personally I vote against this feature. BinaryConfiguration is proven to be
inconvenient, since it has to be configured before node start, it cannot be
changed in runtime, and it requires classes on the server. Moreover, if you
decide to change encoding at some point, it would be impossible.

I think, we should add this feature on API level instead. If string is
written in non-UTF8 form, we will write in different format:
[encoding_code][string]

BInaryWriter.writeString(String fieldName, String val);
BInaryWriter.writeString(String fieldName, String val, *String encoding*);

BinaryReader.readString(String fieldName);
BinaryReader.readString(String fieldName, *String encoding*);

BinaryObjectBuilder.writeString(String fieldName, String val, *String
encoding*);

class MyClass {
    *@BinaryString(encoding = "Cp1251")*
    private String myCyrillicString;
}

Vladimir.

On Sat, Jul 1, 2017 at 7:26 PM, Dmitriy Setrakyan <dsetrakyan@apache.org>
wrote:

> On Sat, Jul 1, 2017 at 2:24 AM, Sergi Vladykin <sergi.vladykin@gmail.com>
> wrote:
>
> > In SQL indexes we may store partial strings and assume them to be in
> UTF-8,
> > I don't think this can be abstracted away. But may be this is not a big
> > deal if in indexes we still will use UTF-8.
> >
>
> Sergi, why does it matter if it is UTF8 or custom encoding? Why can't we
> use our own compact encoding in indexes?
>
>
> >
> > 2017-07-01 10:13 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:
> >
> > > Val, do you know how we compare strings in SQL queries? Will we be able
> > to
> > > use this encoder?
> > >
> > > Additionally, I think that the encoder is a bit too abstract. Why not
> go
> > > even further and allow users create their own ASCII table for encoding?
> > >
> > > D.
> > >
> > > On Fri, Jun 30, 2017 at 6:49 PM, Valentin Kulichenko <
> > > valentin.kulichenko@gmail.com> wrote:
> > >
> > > > Andrey,
> > > >
> > > > Can you elaborate more on this? What is your concern?
> > > >
> > > > -Val
> > > >
> > > > On Fri, Jun 30, 2017 at 6:17 PM Andrey Mashenkov <
> > > > andrey.mashenkov@gmail.com>
> > > > wrote:
> > > >
> > > > > Val,
> > > > >
> > > > > Looks like make sense.
> > > > >
> > > > > This will not affect FullText index, as Lucene has own format for
> > > storing
> > > > > data.
> > > > >
> > > > > But.. would it be compatible with H2 indexing ? I doubt.
> > > > >
> > > > > 1 июля 2017 г. 2:27 пользователь "Valentin Kulichenko"
<
> > > > > valentin.kulichenko@gmail.com> написал:
> > > > >
> > > > > > Folks,
> > > > > >
> > > > > > Currently binary marshaller always encodes strings in UTF-8.
> > However,
> > > > > > sometimes it can be useful to customize this. For example, if
> data
> > > > > contains
> > > > > > a lot of Cyrillic, Chinese or other symbols, but not so many
> Latin
> > > > > symbols,
> > > > > > memory is used very inefficiently. In this case it would be
great
> > to
> > > > > encode
> > > > > > most frequently used symbols in one byte instead of two or three.
> > > > > >
> > > > > > I propose to introduce BinaryStringEncoder interface that will
> > > convert
> > > > > > strings to byte arrays and back, and make it pluggable via
> > > > > > BinaryConfiguration. This will allow users to plug in any
> encoding
> > > > > > algorithms based on their requirements.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/IGNITE-5655
> > > > > >
> > > > > > -Val
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message