Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 02602101A2 for ; Wed, 5 Jun 2013 03:06:51 +0000 (UTC) Received: (qmail 11568 invoked by uid 500); 5 Jun 2013 03:06:47 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 11489 invoked by uid 500); 5 Jun 2013 03:06:47 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 11481 invoked by uid 99); 5 Jun 2013 03:06:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jun 2013 03:06:45 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ekaqu1028@gmail.com designates 209.85.128.179 as permitted sender) Received: from [209.85.128.179] (HELO mail-ve0-f179.google.com) (209.85.128.179) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jun 2013 03:06:39 +0000 Received: by mail-ve0-f179.google.com with SMTP id d10so864371vea.10 for ; Tue, 04 Jun 2013 20:06:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=vylQJXlTU0FnnbOGxqrRCHMLygjdzr6P++dPfxazh68=; b=bO99upM8tilbThU/oteWVIirdAZLO8JkNch5uJNzQAobGG2qYqVO5UqP4/RPbRrRvm snrbZMHuG/KqY3Os1S+MqSM8o4g/Py2OBtoui1JHBmFd04rD6FpCr4pM2x5x7ZfRgWt0 U/KB6bgMoz2ZSutsaZUkMhK77gPifzHPLiVELIWCSRLw0xOYtzSoc+Y/XX4tQFH+HjOE KFD7J1HLUkn8rJ5TGR67UqlrmRKLaX1Q3D6Ci14OtMywsUiNm6S0fmzfUJNH1hpzjFvo OyVoHgUtfAjnKP2bB7IItZS2QDbU320qvMHmNFBzXWQmTenn5xYiN4eaYH0CxV3ZAkh2 pBPQ== MIME-Version: 1.0 X-Received: by 10.52.21.227 with SMTP id y3mr16724291vde.49.1370401578918; Tue, 04 Jun 2013 20:06:18 -0700 (PDT) Received: by 10.52.177.133 with HTTP; Tue, 4 Jun 2013 20:06:18 -0700 (PDT) Received: by 10.52.177.133 with HTTP; Tue, 4 Jun 2013 20:06:18 -0700 (PDT) In-Reply-To: References: Date: Tue, 4 Jun 2013 20:06:18 -0700 Message-ID: Subject: Re: CQL 3 returning duplicate keys From: ekaqu something To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf3079baa44faa0404de5f7bb8 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3079baa44faa0404de5f7bb8 Content-Type: text/plain; charset=ISO-8859-1 Thank you for your detailed response! On Jun 4, 2013 12:00 PM, "Shahab Yunus" wrote: > Thanks Eric for the detailed explanation but can you point to a source or > document for this restriction in CQL3 tables? Doesn't it take away the main > feature of the NoSQL store? Or am I am missing something obvious here? > > Regards, > Shahab > > > On Tue, Jun 4, 2013 at 2:12 PM, Eric Stevens wrote: > >> If this is a standard column family, not a CQL3 table, then using CQL3 >> will not give you the results you expect. >> >> From cassandra-cli, let's set up some test data: >> >> [default@unknown] create keyspace test; >> [default@unknown] use test; >> [default@test] create column family test; >> [default@test] set test['a1']['c1'] = 'a1c1'; >> [default@test] set test['a1']['c2'] = 'a1c2'; >> [default@test] set test['a2']['c1'] = 'a2c1'; >> [default@test] set test['a2']['c2'] = 'a2c2'; >> >> Two rows with two columns each, right? Not as far as CQL3 is concerned: >> >> cqlsh> use test; >> cqlsh:test> select * from test; >> >> key | column1 | value >> -----+---------+-------- >> a2 | 0xc1 | 0xa2c1 >> a2 | 0xc2 | 0xa2c2 >> a1 | 0xc1 | 0xa1c1 >> a1 | 0xc2 | 0xa1c2 >> >> Basically for CQL3, without the additional metadata and enforcement that >> is established by having created the column family as a CQL3 table, CQL >> will treat each key/column pair as a separate row for CQL purposes. This >> is most likely at least in part due to the fact that CQL3 tables *cannot >> have arbitrary columns *like standard column families can. It wouldn't >> know what columns are available for display. This also exposes some of the >> underlying structure behind CQL3 tables. >> >> CQL 3 is not reverse compatible with CQL 2 for most things. If you >> cannot migrate your data to a CQL3 table. >> >> The equivalent structure in CQL3 tables >> >> cqlsh:test> create table test3 (key text PRIMARY KEY, c1 text, c2 text); >> cqlsh:test> INSERT INTO test3(key, c1, c2) VALUES ('a1', 'a1c1', 'a1c2'); >> cqlsh:test> INSERT INTO test3(key, c1, c2) VALUES ('a2', 'a2c1', 'a2c2'); >> cqlsh:test> select * from test3; >> >> key | c1 | c2 >> -----+------+------ >> a2 | a2c1 | a2c2 >> a1 | a1c1 | a1c2 >> >> This comes with many important restrictions, one of which as mentioned is >> that you cannot have arbitrary columns in a CQL3 table, just like you >> cannot in a traditional relational database. Likewise you cannot use >> traditional approaches to populating data into a CQL3 table: >> >> [default@test] get test3['a1']; >> test3 not found in current keyspace. >> [default@test] set test3['a3']['c1'] = 'a3c1'; >> test3 not found in current keyspace. >> [default@test] describe test3; >> WARNING: CQL3 tables are intentionally omitted from 'describe' output. >> >> >> >> >> On Tue, Jun 4, 2013 at 12:56 PM, ekaqu something wrote: >> >>> I run a 1.1 cluster and currently testing out a 1.2 cluster. I have >>> noticed that with 1.2 it switched to CQL3 which is acting differently than >>> I would expect. When I do "select key from \"cf\";" I get many many >>> duplicate keys. When I did the same with CQL 2 I only get the keys >>> defined. This seems to also be the case for count(*), in cql2 it would >>> return the number of keys i have, in 3 it returns way more than i really >>> have. >>> >>> $ cqlsh `hostname` <>> use keyspace; >>> select count(*) from "cf"; >>> EOF >>> >>> >>> count >>> ------- >>> 10000 >>> >>> Default LIMIT of 10000 was used. Specify your own LIMIT clause to get >>> more results. >>> >>> $ cqlsh `hostname` -3 <>> use keyspace; >>> select count(*) from "cf"; >>> EOF >>> >>> >>> count >>> ------- >>> 10000 >>> >>> Default LIMIT of 10000 was used. Specify your own LIMIT clause to get >>> more results. >>> >>> >>> $ cqlsh `hostname` -2 <>> use keyspace; >>> select count(*) from cf; >>> EOF >>> >>> >>> count >>> ------- >>> 1934 >>> >>> 1934 rows have really been inserted. Is there something up with cql3 or >>> is there something else going on? >>> >>> Thanks for your time reading this email. >>> >> >> > --20cf3079baa44faa0404de5f7bb8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Thank you for your detailed response!

On Jun 4, 2013 12:00 PM, "Shahab Yunus"= ; <shahab.yunus@gmail.com&= gt; wrote:
Thanks Eric for the detailed explanation but can you point= to a source or document for this restriction in CQL3 tables? Doesn't i= t take away the main feature of the NoSQL store? Or am I am missing somethi= ng obvious here?=A0

Regards,
Shahab


On Tue, Jun 4, 2013 at 2:12 PM, Eric= Stevens <mightye@gmail.com> wrote:
If this is a standard colum= n family, not a CQL3 table, then using CQL3 will not give you the results y= ou expect.

From cassandra-cli, let's set up some test data:

[default@unknown] create keyspace test;
[default@unknown] use test;
[default@test] create column fa= mily test;
[default@test] set test['a1']['c1'= ;] =3D 'a1c1';
[default@test] set test['a1']['c2'] = =3D 'a1c2';
[default@test] set test['a2= 9;]['c1'] =3D 'a2c1';
[default@test] s= et test['a2']['c2'] =3D 'a2c2';

Two rows with two columns each, right? =A0Not as = far as CQL3 is concerned:

cqlsh> use test;=
cqlsh:test> select * from test;

=A0key | column1 | val= ue
-----+---------+-= -------
=A0 a2 | =A0= =A00xc1 | 0xa2c1
=A0 a2 | =A0 =A00xc2 | 0xa2c2
=A0 a1 | =A0 =A00xc1 |= 0xa1c1
=A0 a1 | =A0= =A00xc2 | 0xa1c2

Basically for CQL3, without the additional metadata and= enforcement that is established by having created the column family as a C= QL3 table, CQL will treat each key/column pair as a separate row for CQL pu= rposes. =A0This is most likely at least in part due to the fact that CQL3 t= ables cannot have arbitrary columns like standard column families ca= n. =A0It wouldn't know what columns are available for display. =A0This = also exposes some of the underlying structure behind CQL3 tables.

CQL 3 is not reverse compatible with CQL 2 for most thi= ngs. =A0If you cannot migrate your data to a CQL3 table.

The equivalent structure in CQL3 tables=A0

cqlsh:test> create table test3 (key text PRIMA= RY KEY, c1 text, c2 text);
cqlsh:test> INSERT INTO te= st3(key, c1, c2) VALUES ('a1', 'a1c1', 'a1c2');
cqlsh:test> INSERT INTO test3(key, c1, c2) VALUES ('a2', &#= 39;a2c1', 'a2c2');
cqlsh:test> select *= from test3;

=A0key | c1 =A0 | c2
-----+------+------
=
=A0 a2 | a2c1 | a2c2
=A0 a1 | a1c1 | a1c2

This comes with many important restrictions, one of whi= ch as mentioned is that you cannot have arbitrary columns in a CQL3 table, = just like you cannot in a traditional relational database. =A0Likewise you = cannot use traditional approaches to populating data into a CQL3 table:

[default@test] get test3['a1'];
= test3 not found in current keyspace.
[default@test] set test= 3['a3']['c1'] =3D 'a3c1';
test3 not found in current keyspace.
[default@test] describe test3;
WARNING: CQL3 tabl= es are intentionally omitted from 'describe' output.




On Tue, Jun 4, 2013 at 12:56 PM, ekaqu somet= hing <ekaqu1028@gmail.com> wrote:
I run a 1.1 cluster and currently testing out a 1.2 cluste= r. =A0I have noticed that with 1.2 it switched to CQL3 which is acting diff= erently than I would expect. =A0When I do "select key from \"cf\&= quot;;" I get many many duplicate keys. =A0When I did the same with CQ= L 2 I only get the keys defined. =A0This seems to also be the case for coun= t(*), in cql2 it would return the number of keys i have, in 3 it returns wa= y more than i really have.

$ cqlsh `hostname` <<EOF
use keyspace;<= /div>
select count(*) from "cf";
EOF

=

=A0count
-------
=A010000

Default LIMIT of 10000 was used. Specify your own LIMIT= clause to get more results.

$ cqlsh `hostnam= e` -3 <<EOF
use keyspace;
select count(*) from &q= uot;cf";
EOF


=A0count
-------=
=A010000

Default LIMIT of 10000 was use= d. Specify your own LIMIT clause to get more results.


$ cqlsh `hostname` -2 <<EOF
use ke= yspace;
select count(*) from cf;
EOF


=A0count
-------
=A0 1934

1934 rows have really been inserted. Is there something= up with cql3 or is there something else going on?

Thanks for your time reading this email.


--20cf3079baa44faa0404de5f7bb8--