Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F202F10AD6 for ; Fri, 30 Aug 2013 18:52:18 +0000 (UTC) Received: (qmail 73740 invoked by uid 500); 30 Aug 2013 18:52:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 73730 invoked by uid 500); 30 Aug 2013 18:52:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 73722 invoked by uid 99); 30 Aug 2013 18:52:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Aug 2013 18:52:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jonathan.haddad@gmail.com designates 209.85.192.171 as permitted sender) Received: from [209.85.192.171] (HELO mail-pd0-f171.google.com) (209.85.192.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Aug 2013 18:52:11 +0000 Received: by mail-pd0-f171.google.com with SMTP id g10so2205718pdj.2 for ; Fri, 30 Aug 2013 11:51:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:content-type:message-id:mime-version:subject:date :references:to:in-reply-to; bh=e5eketT1DywCcc2gZ5gRQMnFUu6yBeoUDUOAdPXTCK8=; b=wW6N2RbkjgKnjaTfgG+bdnTzDy+xkNddm9GJND6Exen8+nZrAnBvMq9x+W5F6HbCZ5 6U84Tslmw3GqQReoLTW9S194JMLTWV5KWiBxmq3x0ZpG8AUFtImq7zl5M8RgDsRfqfEC r0ZkqJYwAIZwtkRYXCyG3g15esVVt6M6gl/NxU9m8hzLznfUoTBs0cez1YAxZURC2xgs EklAxnzUs8eXwsLaawp54YJwyQ2bYt2hV9J3vekH7KNXs+3bI7HbS7oh0H3XP91Qu/J2 w5tY8L9fMqiQngV1R5sFJPMLwkVD5nXJI3iFqV40gZe2RzGiVucyRcw4B97WoA40OXQJ M/KA== X-Received: by 10.66.120.74 with SMTP id la10mr12579696pab.9.1377888711054; Fri, 30 Aug 2013 11:51:51 -0700 (PDT) Received: from [192.168.1.61] (rrcs-76-79-124-126.west.biz.rr.com. [76.79.124.126]) by mx.google.com with ESMTPSA id ib9sm45951232pbc.43.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 30 Aug 2013 11:51:49 -0700 (PDT) Sender: Jon Haddad From: Jon Haddad Content-Type: multipart/alternative; boundary="Apple-Mail=_10589AF4-E2DC-413C-ABC8-E10E2A7ADE66" Message-Id: <3E59DBC9-3A77-4819-948F-91489C046CB1@jonhaddad.com> Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: CQL & Thrift Date: Fri, 30 Aug 2013 11:51:50 -0700 References: <61E7EDCF-0F8D-4C4C-8D35-DF7808B24136@jonhaddad.com> To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1508) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_10589AF4-E2DC-413C-ABC8-E10E2A7ADE66 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 It sounds like you want this: create table data ( pk int, colname blob, value blob, primary key (pk, = colname)); that gives you arbitrary columns (cleverly labeled colname) in a single = row, where the value is "value".=20 If you don't want the overhead of storing "colname" in every row, try = with compact storage. Does this solve the problem, or am I missing something? On Aug 30, 2013, at 11:45 AM, Peter Lin wrote: >=20 > you could dynamically create new tables at runtime and insert rows = into the new table, but is that better than using thrift and putting it = into a regular dynamic column with the exact name type and value type? >=20 > that would mean if there's 20 dynamic columns of different types, = you'd have to execute 21 queries to rebuild the data. That's basically = the same as using EVA tables in relational databases. >=20 > Having used that approach in the past to build temporal databases, it = doesn't scale well. >=20 >=20 >=20 > On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra = wrote: > create a column family as: >=20 > create table dynamicTable(key text, nameAsDouble double, valueAsBlob = blob); >=20 > insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( = "key", double(102.211), textAsBlob('valueInBytes'). >=20 > Do you think, it will work in case column name are double? >=20 > -Vivek >=20 >=20 > On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin wrote: >=20 > In the interest of education and discussion. >=20 > I didn't mean to say CQL3 doesn't support dynamic columns. The example = from the page shows default type defined in the create statement. > create column family data=20 > with key_validation_class=3DInt32Type=20 > and comparator=3DDateType=20 > and default_validation_class=3DFloatType; >=20 >=20 > If I try to insert a dynamic column that uses double for column name = and string for column value, it will throw an error. The kind of use = case I'm talking about defines a minimum number of static columns. Most = of the columns that are added at runtime are different name and value = type. This is specific to my use case. >=20 > Having said that, I believe it "would" be possible to provide that = kind of feature in CQL, but the trade off is it deviates from SQL. The = grammar would have to allow type declaration in the columns list and = functions in the values. Something like >=20 > insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values = ('abc123', "some string", double(102.211)) >=20 > doubleType(newcol1) and string(newcol2) are dynamic columns. >=20 > I know many people find thrift hard to grok and struggle with it, but = I'm a firm believer in taking time to learn. Every developer should take = time to read cassandra source code and the source code for the driver = they're using. >=20 >=20 >=20 > On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis = wrote: > = http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-row= s >=20 >=20 > On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wrote: >=20 > my bias perspective, I find the sweet spot is thrift for insert/update = and CQL for select queries. >=20 > CQL is too limiting and negates the power of storing arbitrary data = types in dynamic columns. >=20 >=20 > On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad wrote: > If you're going to work with CQL, work with CQL. If you're going to = work with Thrift, work with Thrift. Don't mix. >=20 > On Aug 30, 2013, at 10:38 AM, Vivek Mishra = wrote: >=20 >> Hi, >> If i a create a table with CQL3 as=20 >>=20 >> create table user(user_id text PRIMARY KEY, first_name text, = last_name text, emailid text); >>=20 >> and create index as: >> create index on user(first_name); >>=20 >> then inserted some data as: >> insert into user(user_id,first_name,last_name,"emailId") = values('@mevivs','vivek','mishra','vivek.mishra@impetus.co.in'); >>=20 >>=20 >> Then if update same column family using Cassandra-cli as: >>=20 >> update column family user with key_validation_class=3D'UTF8Type' and = column_metadata=3D[{column_name:last_name, validation_class:'UTF8Type', = index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', = index_type:KEYS}]; >>=20 >>=20 >> Now if i connect via cqlsh and explore user table, i can see column = first_name,last_name are not part of table structure anymore. Here is = the output: >>=20 >> CREATE TABLE user ( >> key text PRIMARY KEY >> ) WITH >> bloom_filter_fp_chance=3D0.010000 AND >> caching=3D'KEYS_ONLY' AND >> comment=3D'' AND >> dclocal_read_repair_chance=3D0.000000 AND >> gc_grace_seconds=3D864000 AND >> read_repair_chance=3D0.100000 AND >> replicate_on_write=3D'true' AND >> populate_io_cache_on_flush=3D'false' AND >> compaction=3D{'class': 'SizeTieredCompactionStrategy'} AND >> compression=3D{'sstable_compression': 'SnappyCompressor'}; >>=20 >> cqlsh:cql3usage> select * from user; >>=20 >> user_id >> --------- >> @mevivs >>=20 >>=20 >>=20 >>=20 >>=20 >> I understand that, CQL3 and thrift interoperability is an issue. But = this looks to me a very basic scenario. >>=20 >>=20 >>=20 >> Any suggestions? Or If anybody can explain a reason behind this? >>=20 >> -Vivek >>=20 >>=20 >>=20 >>=20 >=20 >=20 >=20 >=20 >=20 > --=20 > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder, http://www.datastax.com > @spyced >=20 >=20 >=20 --Apple-Mail=_10589AF4-E2DC-413C-ABC8-E10E2A7ADE66 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1

If you don't want the overhead = of storing "colname" in every row, try with compact = storage.

Does this solve the problem, or am I = missing something?

On Aug 30, 2013, at 11:45 AM, = Peter Lin <woolfel@gmail.com> = wrote:


you could = dynamically create new tables at runtime and insert rows into the new = table, but is that better than using thrift and putting it into a = regular dynamic column with the exact name type and value type?

that would mean if there's 20 dynamic columns of different = types, you'd have to execute 21 queries to rebuild the data. That's = basically the same as using EVA tables in relational = databases.

Having used that approach in the past to build temporal databases, it = doesn't scale well.



On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra = <mishra.vivs@gmail.com> wrote:
create = a column family as:

create table dynamicTable(key = text, nameAsDouble double, valueAsBlob blob);

insert into = dynamicTable(key, nameAsDouble, valueAsBlob) values ( = "key", double(102.211), = textAsBlob('valueInBytes').

Do you = think, it will work in case column name are double?

-Vivek
<= /font>


On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin <woolfel@gmail.com> wrote:

In the interest = of education and discussion.

I didn't mean to say CQL3 = doesn't support dynamic columns. The example from the page shows default = type defined in the create statement.
create column family data=20
with key_validation_class=3DInt32Type=20
 and comparator=3DDateType=20
 and default_validation_class=3DFloatType;


If I = try to insert a dynamic column that uses double for column name and = string for column value, it will throw an error. The kind of use case = I'm talking about defines a minimum number of static columns. Most of = the columns that are added at runtime are different name and value type. = This is specific to my use case.

Having said that, I believe it "would" be possible to provide = that kind of feature in CQL, but the trade off is it deviates from SQL. = The grammar would have to allow type declaration in the columns list and = functions in the values. Something like

insert into mytable (KEY, doubleType(newcol1), = string(newcol2)) values ('abc123', "some string", = double(102.211))

doubleType(newcol1) and string(newcol2) = are dynamic columns.

I know many people find thrift hard to grok and struggle with it, = but I'm a firm believer in taking time to learn. Every developer should = take time to read cassandra source code and the source code for the = driver they're using.



On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis = <jbellis@gmail.com> wrote:


On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin = <woolfel@gmail.com> wrote:

my bias perspective, I find the = sweet spot is thrift for insert/update and CQL for select = queries.

CQL is too limiting and negates the power of = storing arbitrary data types in dynamic columns.


On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad <jon@jonhaddad.com> wrote:
If you're going to work with CQL, = work with CQL.  If you're going to work with Thrift, work with = Thrift.  Don't mix.

On Aug 30, 2013, at 10:38 AM, Vivek Mishra <mishra.vivs@gmail.com> = wrote:

Hi,
If i a create a table with CQL3 = as 

create table user(user_id text = PRIMARY KEY, first_name text, last_name text, emailid = text);

and create index as:
create index on = user(first_name);

then inserted some data = as:
insert into user(user_id,first_name,last_name,"emailId") = values('@mevivs','vivek','mishra','vivek.mishra@impetus.co.in');


Then if update same column = family using Cassandra-cli as:

update column = family user with key_validation_class=3D'UTF8Type' and = column_metadata=3D[{column_name:last_name, validation_class:'UTF8Type', = index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', = index_type:KEYS}];


Now if i connect via cqlsh and = explore user table, i can see column first_name,last_name are not part = of table structure anymore. Here is the output:

CREATE TABLE user (
  key text = PRIMARY KEY
) WITH
  = bloom_filter_fp_chance=3D0.010000 AND
  = caching=3D'KEYS_ONLY' AND
  comment=3D'' AND
  dclocal_read_repair_chance=3D0.000000 AND
  = gc_grace_seconds=3D864000 AND
  = read_repair_chance=3D0.100000 AND
  = replicate_on_write=3D'true' AND
  = populate_io_cache_on_flush=3D'false' AND
  compaction=3D{'class': 'SizeTieredCompactionStrategy'} = AND
  compression=3D{'sstable_compression': = 'SnappyCompressor'};

cqlsh:cql3usage> select = * from user;
=

 user_id
---------
 @mev= ivs





I understand that, CQL3 and thrift interoperability = is an issue. But this looks to me a very basic scenario.



Any suggestions? Or If = anybody can explain a reason behind = this?

-Vivek









--
Jonathan Ellis
Project Chair, Apache = Cassandra
co-founder, http://www.datastax.com
@spyced




= --Apple-Mail=_10589AF4-E2DC-413C-ABC8-E10E2A7ADE66--