flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: In AbstractRocksDBState, why write a byte 42 between key and namespace?
Date Wed, 20 Jul 2016 15:03:31 GMT
No, there is no issue for now. It's just not theoretically 100% safe but
the way we use it for now is not problematic.

On Wed, 20 Jul 2016 at 16:07 Maximilian Michels <mxm@apache.org> wrote:

> Is there a JIRA issue for this?
>
> On Mon, Jul 18, 2016 at 12:15 PM, Aljoscha Krettek <aljoscha@apache.org>
> wrote:
> > Ah I see, Stephan and I had a quick chat and it's for cases where there
> are
> > 42s around the edges of the key/namespace.
> >
> > On Mon, 18 Jul 2016 at 11:51 Aljoscha Krettek <aljoscha@apache.org>
> wrote:
> >
> >> In which cases is it not solved? Because then we should make sure to
> solve
> >> it.
> >>
> >> On Mon, 18 Jul 2016 at 10:33 Stephan Ewen <sewen@apache.org> wrote:
> >>
> >>> Got it. But the ambiguity is not really solved by that, just lessened.
> >>>
> >>> On Sun, Jul 17, 2016 at 2:10 PM, Aljoscha Krettek <aljoscha@apache.org
> >
> >>> wrote:
> >>>
> >>> > @Stephan It's not about the serializers not being able to read the
> key.
> >>> The
> >>> > key/namespace are never read again. It's just about the serialized
> form
> >>> > possibly being ambiguous since we don't control the TypeSerializers
> and
> >>> > there might be wanky var-length encoding schemes and what not.
> >>> >
> >>> > On Fri, 15 Jul 2016 at 19:20 Timothy Farkas <
> >>> timothytiborfarkas@gmail.com>
> >>> > wrote:
> >>> >
> >>> > > I've faced a similar issue when serializing data two a key value
> >>> store.
> >>> > Not
> >>> > > sure how helpful it is for this case but two possible solutions
> I've
> >>> used
> >>> > > for persisting keys and values under different namespaces to the
> same
> >>> key
> >>> > > value store are:
> >>> > >
> >>> > > - have all namespaces be the same number of bytes and prefix each
> key
> >>> > with
> >>> > > its namespace.
> >>> > > - Include the number of bytes in the name space and key. So the
> bytes
> >>> > would
> >>> > > look like this:
> >>> > >
> >>> > > [name space num bytes] [ name space] [key num bytes] [key]
> >>> > >
> >>> > > Thanks,
> >>> > > Tim
> >>> > >
> >>> > > On Fri, Jul 15, 2016 at 9:45 AM, Stephan Ewen <sewen@apache.org>
> >>> wrote:
> >>> > >
> >>> > > > Every serializer should know how many bytes to consume. The
key
> >>> > > serializer
> >>> > > > should not need to look for 42 to know where to terminate.
> >>> > > >
> >>> > > > Otherwise this would be a problem case:
> >>> > > > key[42, 42] - 42 - namespace [42, 42, 42]
> >>> > > > key[42, 42, 42] - 42 - namespace [42, 42]
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek <
> >>> aljoscha@apache.org
> >>> > >
> >>> > > > wrote:
> >>> > > >
> >>> > > > > I left that in on purpose to protect against cases where
the
> >>> > > combination
> >>> > > > > of key and namespace can be ambiguous. For example,
these two
> >>> > > > combinations
> >>> > > > > of key and namespace have the same written representation:
> >>> > > > > key [0 1 2] namespace [3 4 5] (values in brackets are
byte
> arrays)
> >>> > > > > key [0 1] namespace [2 3 4 5]
> >>> > > > >
> >>> > > > > having the "magic number" in there protects against
such cases.
> >>> > > > >
> >>> > > > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <sewen@apache.org>
> >>> wrote:
> >>> > > > >
> >>> > > > >> My assumption is that this was a sanity check that
actually
> just
> >>> > stuck
> >>> > > > in
> >>> > > > >> the code.
> >>> > > > >>
> >>> > > > >> It can probably be removed.
> >>> > > > >>
> >>> > > > >> PS: Moving this to the dev@flink.apache.org list...
> >>> > > > >>
> >>> > > > >>
> >>> > > > >>
> >>> > > > >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <mmyy1110@gmail.com>
> wrote:
> >>> > > > >>
> >>> > > > >> > In AbstractRocksDBState.writeKeyAndNamespace():
> >>> > > > >> >
> >>> > > > >> > protected void writeKeyAndNamespace(DataOutputView
out)
> throws
> >>> > > > >> IOException
> >>> > > > >> > {
> >>> > > > >> > backend.keySerializer().serialize(backend.currentKey(),
> out);
> >>> > > > >> > out.writeByte(42);
> >>> > > > >> > namespaceSerializer.serialize(currentNamespace,
out);
> >>> > > > >> > }
> >>> > > > >> >
> >>> > > > >> > Why write a byte 42 between key and namespace?
The
> >>> keySerializer
> >>> > and
> >>> > > > >> > namespaceSerializer know their lengths. It
seems we don't
> need
> >>> > this
> >>> > > > >> byte.
> >>> > > > >> >
> >>> > > > >> > Could anybody tell me what it is for?  Is there
any
> situation
> >>> that
> >>> > > we
> >>> > > > >> must
> >>> > > > >> > have this separator?
> >>> > > > >> >
> >>> > > > >>
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message