flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: In AbstractRocksDBState, why write a byte 42 between key and namespace?
Date Sun, 17 Jul 2016 12:10:26 GMT
@Stephan It's not about the serializers not being able to read the key. The
key/namespace are never read again. It's just about the serialized form
possibly being ambiguous since we don't control the TypeSerializers and
there might be wanky var-length encoding schemes and what not.

On Fri, 15 Jul 2016 at 19:20 Timothy Farkas <timothytiborfarkas@gmail.com>
wrote:

> I've faced a similar issue when serializing data two a key value store. Not
> sure how helpful it is for this case but two possible solutions I've used
> for persisting keys and values under different namespaces to the same key
> value store are:
>
> - have all namespaces be the same number of bytes and prefix each key with
> its namespace.
> - Include the number of bytes in the name space and key. So the bytes would
> look like this:
>
> [name space num bytes] [ name space] [key num bytes] [key]
>
> Thanks,
> Tim
>
> On Fri, Jul 15, 2016 at 9:45 AM, Stephan Ewen <sewen@apache.org> wrote:
>
> > Every serializer should know how many bytes to consume. The key
> serializer
> > should not need to look for 42 to know where to terminate.
> >
> > Otherwise this would be a problem case:
> > key[42, 42] - 42 - namespace [42, 42, 42]
> > key[42, 42, 42] - 42 - namespace [42, 42]
> >
> >
> >
> > On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek <aljoscha@apache.org>
> > wrote:
> >
> > > I left that in on purpose to protect against cases where the
> combination
> > > of key and namespace can be ambiguous. For example, these two
> > combinations
> > > of key and namespace have the same written representation:
> > > key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays)
> > > key [0 1] namespace [2 3 4 5]
> > >
> > > having the "magic number" in there protects against such cases.
> > >
> > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <sewen@apache.org> wrote:
> > >
> > >> My assumption is that this was a sanity check that actually just stuck
> > in
> > >> the code.
> > >>
> > >> It can probably be removed.
> > >>
> > >> PS: Moving this to the dev@flink.apache.org list...
> > >>
> > >>
> > >>
> > >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <mmyy1110@gmail.com> wrote:
> > >>
> > >> > In AbstractRocksDBState.writeKeyAndNamespace():
> > >> >
> > >> > protected void writeKeyAndNamespace(DataOutputView out) throws
> > >> IOException
> > >> > {
> > >> > backend.keySerializer().serialize(backend.currentKey(), out);
> > >> > out.writeByte(42);
> > >> > namespaceSerializer.serialize(currentNamespace, out);
> > >> > }
> > >> >
> > >> > Why write a byte 42 between key and namespace? The keySerializer and
> > >> > namespaceSerializer know their lengths. It seems we don't need this
> > >> byte.
> > >> >
> > >> > Could anybody tell me what it is for?  Is there any situation that
> we
> > >> must
> > >> > have this separator?
> > >> >
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message