ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergi Vladykin <sergi.vlady...@gmail.com>
Subject Re: Custom string encoding
Date Sat, 01 Jul 2017 09:24:19 GMT
In SQL indexes we may store partial strings and assume them to be in UTF-8,
I don't think this can be abstracted away. But may be this is not a big
deal if in indexes we still will use UTF-8.

Sergi

2017-07-01 10:13 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:

> Val, do you know how we compare strings in SQL queries? Will we be able to
> use this encoder?
>
> Additionally, I think that the encoder is a bit too abstract. Why not go
> even further and allow users create their own ASCII table for encoding?
>
> D.
>
> On Fri, Jun 30, 2017 at 6:49 PM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
> > Andrey,
> >
> > Can you elaborate more on this? What is your concern?
> >
> > -Val
> >
> > On Fri, Jun 30, 2017 at 6:17 PM Andrey Mashenkov <
> > andrey.mashenkov@gmail.com>
> > wrote:
> >
> > > Val,
> > >
> > > Looks like make sense.
> > >
> > > This will not affect FullText index, as Lucene has own format for
> storing
> > > data.
> > >
> > > But.. would it be compatible with H2 indexing ? I doubt.
> > >
> > > 1 июля 2017 г. 2:27 пользователь "Valentin Kulichenko" <
> > > valentin.kulichenko@gmail.com> написал:
> > >
> > > > Folks,
> > > >
> > > > Currently binary marshaller always encodes strings in UTF-8. However,
> > > > sometimes it can be useful to customize this. For example, if data
> > > contains
> > > > a lot of Cyrillic, Chinese or other symbols, but not so many Latin
> > > symbols,
> > > > memory is used very inefficiently. In this case it would be great to
> > > encode
> > > > most frequently used symbols in one byte instead of two or three.
> > > >
> > > > I propose to introduce BinaryStringEncoder interface that will
> convert
> > > > strings to byte arrays and back, and make it pluggable via
> > > > BinaryConfiguration. This will allow users to plug in any encoding
> > > > algorithms based on their requirements.
> > > >
> > > > Thoughts?
> > > >
> > > > https://issues.apache.org/jira/browse/IGNITE-5655
> > > >
> > > > -Val
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message