cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ondřej Černoš (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6428) Inconsistency in CQL native protocol
Date Mon, 02 Dec 2013 12:33:36 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836468#comment-13836468
] 

Ondřej Černoš commented on CASSANDRA-6428:
------------------------------------------

Hi [~slebresne], thanks for the quick response. I am a colleague of the reporter, so let me
continue with this ticket. I see that this is rather a duplicate of CASSANDRA-5428.

What I don't quite get is the suggestion for clustering columns. If I get the implementation
right, collections are implemented using the same mechanism that is used for clustered columns,
right? Our "big" (read: hundred of thousands maximum) collections are sets (we use maps, but
these are small, tens of records) in a structure like CREATE TABLE test (id text PRIMARY KEY,
val1 text, val2 int, val3 timestamp, ..., valN text, some_set set<text>) where N <
20. We could store the set in other table with (id, set_member) primary key (and lose the
ability to query the row at once) or we could use all the attributes of the table as a part
of primary key (which seems kind of super-weird). We don't care much about the inability to
do range queries on the set, we usually need all the values and deleting from the set is quite
a rare operation (I know there is the problem with tombstones and eroding performance, we
need to test this - but the other implementation with composite keys suffers with this problem
as well). So what should we do? And what is the rationale for the "don't use collections as
wide rows" if it is true that the underlying implementation for clustered primary keys is
the same? Please don't understand this as a request for unpaid consulting but just as a feedback
from confused CQL3 user. Maybe all that is really needed is documentation and a set of best
practices documents?

> Inconsistency in CQL native protocol
> ------------------------------------
>
>                 Key: CASSANDRA-6428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6428
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jan Chochol
>
> We are trying to use Cassandra CQL3 collections (sets and maps) for denormalizing data.
> Problem is, when size of these collections go above some limit. We found that current
limitation is 64k - 1 (65535) items in collection.
> We found that there is inconsistency in CQL binary protocol (all current available versions).

> In protocol (for set) there are these fields:
> {noformat}
> [value size: int] [items count: short] [items] ...
> {noformat}
> One example in our case (collection with 65536 elements):
> {noformat}
> 00 21 ff ee 00 00 00 20 30 30 30 30 35 63 38 69 65 33 67 37 73 61 ...
> {noformat}
> So decode {{value size}} is 1245166 bytes and {{items count}} is 0.
> This is wrong - you can not have collection with 0 items occupying more than 1MB.
> I understand that in unsigned short you can not have more than 65535, but I do not understand
why there is such limitation in protocol, when all data are currently sent.
> In this case we have several possibilities:
> * ignore {{items count}} field and read all bytes specified in {{value size}}
> ** there is problem that we can not be sure, that this behaviour will be kept over for
future versions of Cassandra, as it is quite strange
> * refactor our code to use only small collections (this seems quite odd, as Cassandra
has no problems with wide rows)
> * do not use collections, and fall-back to net wide rows
> * wait for change in protocol for removing unnecessary limitation



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message