cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-5428) CQL3 don't validate that collections haven't more than 64K elements
Date Mon, 02 Dec 2013 12:11:35 GMT


Sylvain Lebresne updated CASSANDRA-5428:

    Attachment: 5428.txt

First version of the patch was incorrect, attaching corrected version.

> CQL3 don't validate that collections haven't more than 64K elements
> -------------------------------------------------------------------
>                 Key: CASSANDRA-5428
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.2.13
>         Attachments: 5428.txt
> This is somewhat similar to CASSANDRA-5355 but with a twist. When we serialize collections,
not only does the size of the elements is limited to 64K, but the number of elements is too
because it is also an unsigned short.
> Now the same argument than in CASSANDRA-5355 that collections are "places to denormalize
small amounts of data" is true here too. So the fact that collections are limited to 64K elements
is something I could live with. However, we don't validate that no more than 64K elements
are inserted. And in fact, we can't validate it if the elements are added one by one.
> So in practice, you can insert more than 64K elements, but if you try to read it, you
will only get back some subset of the collection. And the number of elements returned will
correspond to the 2 last bytes of the real size (so a collection of 65536 elements will be
returned as 1 element). Imo, that's more problematic.
> So since unfortunately we can't validate this at insertion, I suggest that as a first
step we:
> # document that limitation (in typically)
> # when we read a collection that has > 64K elements, we detect it and when serializing
that for the client, we:
> ** return as much as we can, i.e. the 64K first ones
> ** log a warning that something is wrong
> On the longer term, for 2.0, maybe we should just change the serialization format and
use an int for the collection size, using an unsigned short was probably misguided. Of course
that changes said serialization format so we have to bump the native protocol version for
that (and thus can't do that in 1.2).

This message was sent by Atlassian JIRA

View raw message