cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8959) More efficient frozen UDT and tuple serialization format
Date Thu, 12 Mar 2015 08:13:38 GMT


Sylvain Lebresne commented on CASSANDRA-8959:

For the record, this should also be extended to collections.

I'll note that there is 2 subparts to this: the internal encoding, and the one we send to
clients. It's technically possible to not have the same encoding for both and translate when
receiving/sending to clients, but what is inefficient internally is also inefficient on the
native protocol so I'd suggest we switch to the same more efficient encoding for both (but
for existing version of the native protocol, this does mean we'll have to translate to the
old format, which is ok).

> More efficient frozen UDT and tuple serialization format
> --------------------------------------------------------
>                 Key: CASSANDRA-8959
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>              Labels: performance
>             Fix For: 3.1
> The current serialization format for UDTs has a fixed overhead of 4 bytes per defined
field (encoding the size of the field).
> It is inefficient for sparse UDTs - ones with many defined fields, but few of them present.
We could keep a bitset to indicate the missing fields, if any.
> It's sub-optimal for encoding UDTs with all the values present as well. We could use
varint encoding for the field sizes of blob/text fields and encode 'fixed' sized types directly,
without the 4-bytes size prologue.
> That or something more brilliant. Any improvement right now is lhf.

This message was sent by Atlassian JIRA

View raw message