Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 72BAF10EFA for ; Tue, 4 Jun 2013 18:12:54 +0000 (UTC) Received: (qmail 60314 invoked by uid 500); 4 Jun 2013 18:12:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 60295 invoked by uid 500); 4 Jun 2013 18:12:51 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 60287 invoked by uid 99); 4 Jun 2013 18:12:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Jun 2013 18:12:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mightye@gmail.com designates 209.85.212.178 as permitted sender) Received: from [209.85.212.178] (HELO mail-wi0-f178.google.com) (209.85.212.178) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Jun 2013 18:12:44 +0000 Received: by mail-wi0-f178.google.com with SMTP id hj6so496386wib.11 for ; Tue, 04 Jun 2013 11:12:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=DFQJFrFIDKUpPtaOgRx66g1WOw29NFMtx8iWFEVcY7I=; b=KZazcHw/7fDhXAwR/gTvL12UACML31ra6ckmlzRT3D8Xeq50fgKoh9h05jG7Z8Ue1A Xjqa6lRcyWOgRfM1Kigz3vzsS0Oi2GxW91YBMGT1FjOXqbIir7sxpG3KOisYFU0JAcml RkxEN4l5imnglnX9d0cdrNuZP7Ac7U79oEnUHhr/TcD2FdtJs2jDcbrhJLc/CPy9i9gT WIxzDwAmrdtAswaw5bOT3jDnEgz2ewZ3LwpFyW28v2a8/O9ScOO8nvpS7Zwd8mSJ35NS Vfp1hPtB6WuhBAMYskIW8KB/0XszXMXqABvnBd/8E90w+5GyyUOT8IZZX7gZC6wqbARG BV9A== X-Received: by 10.194.123.9 with SMTP id lw9mr25139369wjb.24.1370369543866; Tue, 04 Jun 2013 11:12:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.171.131 with HTTP; Tue, 4 Jun 2013 11:12:03 -0700 (PDT) In-Reply-To: References: From: Eric Stevens Date: Tue, 4 Jun 2013 14:12:03 -0400 Message-ID: Subject: Re: CQL 3 returning duplicate keys To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=089e01228514df901904de58058f X-Virus-Checked: Checked by ClamAV on apache.org --089e01228514df901904de58058f Content-Type: text/plain; charset=UTF-8 If this is a standard column family, not a CQL3 table, then using CQL3 will not give you the results you expect. >From cassandra-cli, let's set up some test data: [default@unknown] create keyspace test; [default@unknown] use test; [default@test] create column family test; [default@test] set test['a1']['c1'] = 'a1c1'; [default@test] set test['a1']['c2'] = 'a1c2'; [default@test] set test['a2']['c1'] = 'a2c1'; [default@test] set test['a2']['c2'] = 'a2c2'; Two rows with two columns each, right? Not as far as CQL3 is concerned: cqlsh> use test; cqlsh:test> select * from test; key | column1 | value -----+---------+-------- a2 | 0xc1 | 0xa2c1 a2 | 0xc2 | 0xa2c2 a1 | 0xc1 | 0xa1c1 a1 | 0xc2 | 0xa1c2 Basically for CQL3, without the additional metadata and enforcement that is established by having created the column family as a CQL3 table, CQL will treat each key/column pair as a separate row for CQL purposes. This is most likely at least in part due to the fact that CQL3 tables *cannot have arbitrary columns *like standard column families can. It wouldn't know what columns are available for display. This also exposes some of the underlying structure behind CQL3 tables. CQL 3 is not reverse compatible with CQL 2 for most things. If you cannot migrate your data to a CQL3 table. The equivalent structure in CQL3 tables cqlsh:test> create table test3 (key text PRIMARY KEY, c1 text, c2 text); cqlsh:test> INSERT INTO test3(key, c1, c2) VALUES ('a1', 'a1c1', 'a1c2'); cqlsh:test> INSERT INTO test3(key, c1, c2) VALUES ('a2', 'a2c1', 'a2c2'); cqlsh:test> select * from test3; key | c1 | c2 -----+------+------ a2 | a2c1 | a2c2 a1 | a1c1 | a1c2 This comes with many important restrictions, one of which as mentioned is that you cannot have arbitrary columns in a CQL3 table, just like you cannot in a traditional relational database. Likewise you cannot use traditional approaches to populating data into a CQL3 table: [default@test] get test3['a1']; test3 not found in current keyspace. [default@test] set test3['a3']['c1'] = 'a3c1'; test3 not found in current keyspace. [default@test] describe test3; WARNING: CQL3 tables are intentionally omitted from 'describe' output. On Tue, Jun 4, 2013 at 12:56 PM, ekaqu something wrote: > I run a 1.1 cluster and currently testing out a 1.2 cluster. I have > noticed that with 1.2 it switched to CQL3 which is acting differently than > I would expect. When I do "select key from \"cf\";" I get many many > duplicate keys. When I did the same with CQL 2 I only get the keys > defined. This seems to also be the case for count(*), in cql2 it would > return the number of keys i have, in 3 it returns way more than i really > have. > > $ cqlsh `hostname` < use keyspace; > select count(*) from "cf"; > EOF > > > count > ------- > 10000 > > Default LIMIT of 10000 was used. Specify your own LIMIT clause to get more > results. > > $ cqlsh `hostname` -3 < use keyspace; > select count(*) from "cf"; > EOF > > > count > ------- > 10000 > > Default LIMIT of 10000 was used. Specify your own LIMIT clause to get more > results. > > > $ cqlsh `hostname` -2 < use keyspace; > select count(*) from cf; > EOF > > > count > ------- > 1934 > > 1934 rows have really been inserted. Is there something up with cql3 or is > there something else going on? > > Thanks for your time reading this email. > --089e01228514df901904de58058f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
If this is a standard column family, not a CQL3 table, the= n using CQL3 will not give you the results you expect.

From cassandra-cli, let's set up some test data:

[default@unknown] create keyspace test;
[default@unknown] use test;
[default@test] create column fa= mily test;
[default@test] set test['a1']['c1'= ;] =3D 'a1c1';
[default@test] set test['a1']['c2'] = =3D 'a1c2';
[default@test] set test['a2= 9;]['c1'] =3D 'a2c1';
[default@test] s= et test['a2']['c2'] =3D 'a2c2';

Two rows with two columns each, right? =C2= =A0Not as far as CQL3 is concerned:

cqlsh> use test;
cqlsh:test> select * from test;

=C2=A0key | column1 | = value
-----+--------= -+--------
=C2=A0 a2= | =C2=A0 =C2=A00xc1 | 0xa2c1
=C2=A0 a2 | =C2=A0 =C2=A00xc2 | = 0xa2c2
=C2=A0 a1 | = =C2=A0 =C2=A00xc1 | 0xa1c1
=C2=A0 a1 | =C2=A0 =C2=A00xc2 | 0xa1c2

Basically for CQL3, without the additional metada= ta and enforcement that is established by having created the column family = as a CQL3 table, CQL will treat each key/column pair as a separate row for = CQL purposes. =C2=A0This is most likely at least in part due to the fact th= at CQL3 tables cannot have arbitrary columns like standard column fa= milies can. =C2=A0It wouldn't know what columns are available for displ= ay. =C2=A0This also exposes some of the underlying structure behind CQL3 ta= bles.

CQL 3 is not reverse compatible with CQL 2 = for most things. =C2=A0If you cannot migrate your data to a CQL3 table.

The equivalent structure in CQL3 tables= =C2=A0

cqlsh:test> create table test3 (ke= y text PRIMARY KEY, c1 text, c2 text);
cqlsh:test> IN= SERT INTO test3(key, c1, c2) VALUES ('a1', 'a1c1', 'a1c= 2');
cqlsh:test> INSERT INTO test3(key, c1, c2) VALUES ('a2', &#= 39;a2c1', 'a2c2');
cqlsh:test> select *= from test3;

=C2=A0key | c1 =C2=A0 | c2
-----+------+------
=
=C2=A0 a2 | a2c1 | a2c2
=C2=A0 a1 | a1c1 | a1c2

This comes with many important restrictions, one = of which as mentioned is that you cannot have arbitrary columns in a CQL3 t= able, just like you cannot in a traditional relational database. =C2=A0Like= wise you cannot use traditional approaches to populating data into a CQL3 t= able:

[default@test] get test3['a1']= ;
test3 not found in current keyspace.
[default@te= st] set test3['a3']['c1'] =3D 'a3c1';
test3 not found in current keyspace.
[default@test] describe test3;
WARNING: CQL3 tabl= es are intentionally omitted from 'describe' output.




On Tue, Jun 4, 2013 at 12:56 PM, ekaqu somet= hing <ekaqu1028@gmail.com> wrote:
I run a 1.1 cluster and currently testing out a 1.2 cluste= r. =C2=A0I have noticed that with 1.2 it switched to CQL3 which is acting d= ifferently than I would expect. =C2=A0When I do "select key from \&quo= t;cf\";" I get many many duplicate keys. =C2=A0When I did the sam= e with CQL 2 I only get the keys defined. =C2=A0This seems to also be the c= ase for count(*), in cql2 it would return the number of keys i have, in 3 i= t returns way more than i really have.

$ cqlsh `hostname` <<EOF
use keyspace;<= /div>
select count(*) from "cf";
EOF

=

=C2=A0count
-------
=C2=A0100= 00

Default LIMIT of 10000 was used. Specify your own LIMIT= clause to get more results.

$ cqlsh `hostnam= e` -3 <<EOF
use keyspace;
select count(*) from &q= uot;cf";
EOF


=C2=A0count
----= ---
=C2=A010000

Default LIMIT of 10000 w= as used. Specify your own LIMIT clause to get more results.

<= /div>

$ cqlsh `hostname` -2 <<EOF
use ke= yspace;
select count(*) from cf;
EOF


=C2=A0count
-------
=C2=A0 1934

1934 rows have really been inserted. Is there something= up with cql3 or is there something else going on?

Thanks for your time reading this email.

--089e01228514df901904de58058f--