ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Setrakyan <dsetrak...@apache.org>
Subject Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)
Date Thu, 27 Jul 2017 14:35:58 GMT
Pavel, what would be the size overhead? Are we adding 1 byte for every
field just for this? If you would like to have this info in the binary
object directly, can we in this case have some bitmap of field-to-encoding?

D.

On Thu, Jul 27, 2017 at 9:22 AM, Pavel Tupitsyn <ptupitsyn@apache.org>
wrote:

> I'm not sure I uderstand how this "per field" configuration is supposed to
> be implemented.
> * Marshaller is not tied to a cache. It serializes all kinds of things,
> like compute job parameters and results.
> * Raw mode does not involve field names.
>
> Also it seems like a complicated and expensive solution - looking up string
> format somewhere in the metadata will be slow.
>
> "encoded string" data type suggestion from Vladimir looks better to me from
> performance and implementation standpoint.
>
> Thanks,
> Pavel
>
>
>
> On Thu, Jul 27, 2017 at 5:10 PM, Dmitriy Setrakyan <dsetrakyan@apache.org>
> wrote:
>
> > On Thu, Jul 27, 2017 at 9:04 AM, Igor Sapego <isapego@apache.org> wrote:
> >
> > > Just a note from the platforms guy:
> > >
> > > Solution with table-level configuration is going to be significantly
> > > harder to implement for platforms and ODBC then field-level one.
> > >
> >
> > Igor, it seems like you are advocating the per-cell configuration, not
> > per-field one. The per-field configuration can be defined at the
> > table/cache level.
> >
> > I see your point about C++ and .NET integrations however. Can't we
> provide
> > this info at node-join time or table-creation time? This way all nodes
> will
> > receive it and you will be able to grab it on different platforms.
> >
> >
> > >
> > > Also, what about binary objects, which are not stored in cache,
> > > but being marshalled?
> > >
> >
> > I think the default system encoding should be used here. If we don't have
> > configuration for default encoding, we should add it.
> >
> >
> > >
> > >
> > > Best Regards,
> > > Igor
> > >
> > > On Wed, Jul 26, 2017 at 7:22 PM, Dmitriy Setrakyan <
> > dsetrakyan@apache.org>
> > > wrote:
> > >
> > > > On Wed, Jul 26, 2017 at 3:40 AM, Vyacheslav Daradur <
> > daradurvs@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > >
> > > > > > Encoding must be set on per field basis. This will give us as
> most
> > > > > flexible
> > > > > > solution at the cost of 1-byte overhead.
> > > > >
> > > > > > Vova, I agree that the encoding should be set on per-field basis,
> > but
> > > > at
> > > > > > the table level, not at a cell level.
> > > > >
> > > > > Dmitriy, Vladimir,
> > > > > Let's use both approaches :-)
> > > > > We can add parameter to CacheConfiguration.
> > > > > If parameter specifie to use cache level encoding then marshaller
> > will
> > > > use
> > > > > encoding in a cache,
> > > > > otherwise marshaller will use per-field encoding.
> > > > > Of course only if it doesn't complicate the solution.
> > > > >
> > > > >
> > > > I think that it will complicate the solution and will complicate the
> > > > marshalling protocol. The advantage of specifying the encoding at
> > > > table/cache level is that we don't need to add extra encoding bytes
> to
> > > the
> > > > marshalling protocol.
> > > >
> > > > I think Vova was suggesting encoding at the cell level, not at the
> > field
> > > > level, which seems to be redundant to me.
> > > >
> > > > Vova, do you agree?
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message