Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of jonathan.haddad@gmail.com
 designates 209.85.192.171 as permitted sender)
Sender: Jon Haddad <jonathan.haddad@gmail.com>
From: Jon Haddad <jon@jonhaddad.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_10589AF4-E2DC-413C-ABC8-E10E2A7ADE66"
Message-Id: <3E59DBC9-3A77-4819-948F-91489C046CB1@jonhaddad.com>
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: CQL & Thrift
Date: Fri, 30 Aug 2013 11:51:50 -0700
References: 
 <CANJo1uA6_-SEQ1Qu-CB89aahpEYuDYe_-Q-Q3X725TY8Qsyf4w@mail.gmail.com>
 <61E7EDCF-0F8D-4C4C-8D35-DF7808B24136@jonhaddad.com>
 <CAKv2g8emby9j5rBX20ty3EawmRY2=K2mmMTyfkXPiq+whxAuOw@mail.gmail.com>
 <CALdd-zgVzvRwCBfB4EtZ-5X8LdeGZYnGXA0Zt_EsyR+A=6VThw@mail.gmail.com>
 <CAKv2g8dFv-K9EhH5v29CNM3W+c_ffEwSt7U_ACjJiyX-i79M+g@mail.gmail.com>
 <CANJo1uBQGhwsjixz2KzJyJwyJVYJ+c6EWA=9osrp1ojqPPL65w@mail.gmail.com>
 <CAKv2g8e8GKP-os4QhtQm6b4=D6r3G9MP3bfri3x07hmSRCZYSw@mail.gmail.com>
To: user@cassandra.apache.org
In-Reply-To: 
 <CAKv2g8e8GKP-os4QhtQm6b4=D6r3G9MP3bfri3x07hmSRCZYSw@mail.gmail.com>


--Apple-Mail=_10589AF4-E2DC-413C-ABC8-E10E2A7ADE66
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

It sounds like you want this:

create table data ( pk int, colname blob, value blob, primary key (pk, =
colname));

that gives you arbitrary columns (cleverly labeled colname) in a single =
row, where the value is "value".=20

If you don't want the overhead of storing "colname" in every row, try =
with compact storage.

Does this solve the problem, or am I missing something?

On Aug 30, 2013, at 11:45 AM, Peter Lin <woolfel@gmail.com> wrote:

>=20
> you could dynamically create new tables at runtime and insert rows =
into the new table, but is that better than using thrift and putting it =
into a regular dynamic column with the exact name type and value type?
>=20
> that would mean if there's 20 dynamic columns of different types, =
you'd have to execute 21 queries to rebuild the data. That's basically =
the same as using EVA tables in relational databases.
>=20
> Having used that approach in the past to build temporal databases, it =
doesn't scale well.
>=20
>=20
>=20
> On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra <mishra.vivs@gmail.com> =
wrote:
> create a column family as:
>=20
> create table dynamicTable(key text, nameAsDouble double, valueAsBlob =
blob);
>=20
> insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( =
"key", double(102.211), textAsBlob('valueInBytes').
>=20
> Do you think, it will work in case column name are double?
>=20
> -Vivek
>=20
>=20
> On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin <woolfel@gmail.com> wrote:
>=20
> In the interest of education and discussion.
>=20
> I didn't mean to say CQL3 doesn't support dynamic columns. The example =
from the page shows default type defined in the create statement.
> create column family data=20
> with key_validation_class=3DInt32Type=20
>  and comparator=3DDateType=20
>  and default_validation_class=3DFloatType;
>=20
>=20
> If I try to insert a dynamic column that uses double for column name =
and string for column value, it will throw an error. The kind of use =
case I'm talking about defines a minimum number of static columns. Most =
of the columns that are added at runtime are different name and value =
type. This is specific to my use case.
>=20
> Having said that, I believe it "would" be possible to provide that =
kind of feature in CQL, but the trade off is it deviates from SQL. The =
grammar would have to allow type declaration in the columns list and =
functions in the values. Something like
>=20
> insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values =
('abc123', "some string", double(102.211))
>=20
> doubleType(newcol1) and string(newcol2) are dynamic columns.
>=20
> I know many people find thrift hard to grok and struggle with it, but =
I'm a firm believer in taking time to learn. Every developer should take =
time to read cassandra source code and the source code for the driver =
they're using.
>=20
>=20
>=20
> On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis <jbellis@gmail.com> =
wrote:
> =
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-row=
s
>=20
>=20
> On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin <woolfel@gmail.com> wrote:
>=20
> my bias perspective, I find the sweet spot is thrift for insert/update =
and CQL for select queries.
>=20
> CQL is too limiting and negates the power of storing arbitrary data =
types in dynamic columns.
>=20
>=20
> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad <jon@jonhaddad.com> wrote:
> If you're going to work with CQL, work with CQL.  If you're going to =
work with Thrift, work with Thrift.  Don't mix.
>=20
> On Aug 30, 2013, at 10:38 AM, Vivek Mishra <mishra.vivs@gmail.com> =
wrote:
>=20
>> Hi,
>> If i a create a table with CQL3 as=20
>>=20
>> create table user(user_id text PRIMARY KEY, first_name text, =
last_name text, emailid text);
>>=20
>> and create index as:
>> create index on user(first_name);
>>=20
>> then inserted some data as:
>> insert into user(user_id,first_name,last_name,"emailId") =
values('@mevivs','vivek','mishra','vivek.mishra@impetus.co.in');
>>=20
>>=20
>> Then if update same column family using Cassandra-cli as:
>>=20
>> update column family user with key_validation_class=3D'UTF8Type' and =
column_metadata=3D[{column_name:last_name, validation_class:'UTF8Type', =
index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', =
index_type:KEYS}];
>>=20
>>=20
>> Now if i connect via cqlsh and explore user table, i can see column =
first_name,last_name are not part of table structure anymore. Here is =
the output:
>>=20
>> CREATE TABLE user (
>>   key text PRIMARY KEY
>> ) WITH
>>   bloom_filter_fp_chance=3D0.010000 AND
>>   caching=3D'KEYS_ONLY' AND
>>   comment=3D'' AND
>>   dclocal_read_repair_chance=3D0.000000 AND
>>   gc_grace_seconds=3D864000 AND
>>   read_repair_chance=3D0.100000 AND
>>   replicate_on_write=3D'true' AND
>>   populate_io_cache_on_flush=3D'false' AND
>>   compaction=3D{'class': 'SizeTieredCompactionStrategy'} AND
>>   compression=3D{'sstable_compression': 'SnappyCompressor'};
>>=20
>> cqlsh:cql3usage> select * from user;
>>=20
>>  user_id
>> ---------
>>  @mevivs
>>=20
>>=20
>>=20
>>=20
>>=20
>> I understand that, CQL3 and thrift interoperability is an issue. But =
this looks to me a very basic scenario.
>>=20
>>=20
>>=20
>> Any suggestions? Or If anybody can explain a reason behind this?
>>=20
>> -Vivek
>>=20
>>=20
>>=20
>>=20
>=20
>=20
>=20
>=20
>=20
> --=20
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>=20
>=20
>=20


--Apple-Mail=_10589AF4-E2DC-413C-ABC8-E10E2A7ADE66
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=iso-8859-1

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Diso-8859-1"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><div>It sounds like you want this:</div><div><br></div><div>create =
table data ( pk int, colname blob, value blob, primary key (pk, =
colname));</div><div><br></div><div>that gives you arbitrary columns =
(cleverly labeled colname) in a single row, where the value is =
"value".&nbsp;</div><div><br></div><div>If you don't want the overhead =
of storing "colname" in every row, try with compact =
storage.</div><div><br></div><div>Does this solve the problem, or am I =
missing something?</div><br><div><div>On Aug 30, 2013, at 11:45 AM, =
Peter Lin &lt;<a =
href=3D"mailto:woolfel@gmail.com">woolfel@gmail.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><div dir=3D"ltr"><div><div><div><br></div>you could =
dynamically create new tables at runtime and insert rows into the new =
table, but is that better than using thrift and putting it into a =
regular dynamic column with the exact name type and value type?<br>
<br></div>that would mean if there's 20 dynamic columns of different =
types, you'd have to execute 21 queries to rebuild the data. That's =
basically the same as using EVA tables in relational =
databases.<br><br></div>
Having used that approach in the past to build temporal databases, it =
doesn't scale well.<br><br></div><div class=3D"gmail_extra"><br><br><div =
class=3D"gmail_quote">On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra =
<span dir=3D"ltr">&lt;<a href=3D"mailto:mishra.vivs@gmail.com" =
target=3D"_blank">mishra.vivs@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">create =
a column family as:<div><br></div><div>create table dynamicTable(key =
text, nameAsDouble double, valueAsBlob blob);</div>
<div><br></div><div><span =
style=3D"font-family:arial,sans-serif;font-size:13px">insert into =
dynamicTable(key, nameAsDouble, valueAsBlob) values ( =
"key",&nbsp;</span><span =
style=3D"font-family:arial,sans-serif;font-size:13px">double(102.211), =
textAsBlob('valueInBytes').</span><br>

</div><div><span =
style=3D"font-family:arial,sans-serif;font-size:13px"><br></span></div><di=
v><span style=3D"font-family:arial,sans-serif;font-size:13px">Do you =
think, it will work in case column name are double?</span></div>
<span class=3D"HOEnZb"><font color=3D"#888888">
<div><span =
style=3D"font-family:arial,sans-serif;font-size:13px"><br></span></div><di=
v><span =
style=3D"font-family:arial,sans-serif;font-size:13px">-Vivek</span></div><=
/font></span></div><div class=3D"HOEnZb"><div class=3D"h5"><div =
class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">
On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin <span dir=3D"ltr">&lt;<a =
href=3D"mailto:woolfel@gmail.com" =
target=3D"_blank">woolfel@gmail.com</a>&gt;</span> wrote:<br><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex">

<div dir=3D"ltr"><div><div><div><div><div><div><br></div>In the interest =
of education and discussion.<br><br></div>I didn't mean to say CQL3 =
doesn't support dynamic columns. The example from the page shows default =
type defined in the create statement.<br>


<pre><tt>create column family data=20
with key_validation_class=3DInt32Type=20
 and comparator=3DDateType=20
 and default_validation_class=3DFloatType;<br><br><br></tt></pre>If I =
try to insert a dynamic column that uses double for column name and =
string for column value, it will throw an error. The kind of use case =
I'm talking about defines a minimum number of static columns. Most of =
the columns that are added at runtime are different name and value type. =
This is specific to my use case.<br>


<br></div>Having said that, I believe it "would" be possible to provide =
that kind of feature in CQL, but the trade off is it deviates from SQL. =
The grammar would have to allow type declaration in the columns list and =
functions in the values. Something like<br>


<br></div>insert into mytable (KEY, doubleType(newcol1), =
string(newcol2)) values ('abc123', "some string", =
double(102.211))<br><br></div>doubleType(newcol1) and string(newcol2) =
are dynamic columns.<br><br>


</div>I know many people find thrift hard to grok and struggle with it, =
but I'm a firm believer in taking time to learn. Every developer should =
take time to read cassandra source code and the source code for the =
driver they're using.<br>


<br></div><div><div class=3D"gmail_extra"><br><br><div =
class=3D"gmail_quote">On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis =
<span dir=3D"ltr">&lt;<a href=3D"mailto:jbellis@gmail.com" =
target=3D"_blank">jbellis@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><a =
href=3D"http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-=
wide-rows" =
target=3D"_blank">http://www.datastax.com/dev/blog/does-cql-support-dynami=
c-columns-wide-rows</a><br>


</div><div class=3D"gmail_extra"><div><br><br>

<div class=3D"gmail_quote">On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin =
<span dir=3D"ltr">&lt;<a href=3D"mailto:woolfel@gmail.com" =
target=3D"_blank">woolfel@gmail.com</a>&gt;</span> wrote:<br><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex">


<div dir=3D"ltr"><div><div><br></div>my bias perspective, I find the =
sweet spot is thrift for insert/update and CQL for select =
queries.<br><br></div>CQL is too limiting and negates the power of =
storing arbitrary data types in dynamic columns.<br>


</div><div><div class=3D"gmail_extra"><br><br><div =
class=3D"gmail_quote">On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad <span =
dir=3D"ltr">&lt;<a href=3D"mailto:jon@jonhaddad.com" =
target=3D"_blank">jon@jonhaddad.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"word-wrap:break-word"><div>If you're going to work with CQL, =
work with CQL. &nbsp;If you're going to work with Thrift, work with =
Thrift. &nbsp;Don't mix.</div>


<div><br><div><div>On Aug 30, 2013, at 10:38 AM, Vivek Mishra &lt;<a =
href=3D"mailto:mishra.vivs@gmail.com" =
target=3D"_blank">mishra.vivs@gmail.com</a>&gt; =
wrote:</div><br><blockquote type=3D"cite"><div dir=3D"ltr">
Hi,<div>If i a create a table with CQL3 =
as&nbsp;</div><div><div><br></div><div>create table user(user_id text =
PRIMARY KEY, first_name text, last_name text, emailid =
text);</div><div><br></div><div>
and create index as:</div><div><div>create index on =
user(first_name);</div><div><br></div><div>then inserted some data =
as:</div><div>insert into user(user_id,first_name,last_name,"emailId") =
values('@mevivs','vivek','mishra','<a =
href=3D"mailto:vivek.mishra@impetus.co.in" =
target=3D"_blank">vivek.mishra@impetus.co.in</a>');<br>


</div><div><br></div><div><br></div><div>Then if update same column =
family using Cassandra-cli as:</div><div><br></div><div>update column =
family user with key_validation_class=3D'UTF8Type' and =
column_metadata=3D[{column_name:last_name, validation_class:'UTF8Type', =
index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', =
index_type:KEYS}];<br>


</div><div><br></div><div><br></div><div>Now if i connect via cqlsh and =
explore user table, i can see column first_name,last_name are not part =
of table structure anymore. Here is the output:</div><div>
<br></div><div><div>CREATE TABLE user (</div><div>&nbsp; key text =
PRIMARY KEY</div><div>) WITH</div><div>&nbsp; =
bloom_filter_fp_chance=3D0.010000 AND</div><div>&nbsp; =
caching=3D'KEYS_ONLY' AND</div><div>&nbsp; comment=3D'' AND</div>


<div>&nbsp; dclocal_read_repair_chance=3D0.000000 AND</div><div>&nbsp; =
gc_grace_seconds=3D864000 AND</div><div>&nbsp; =
read_repair_chance=3D0.100000 AND</div><div>&nbsp; =
replicate_on_write=3D'true' AND</div><div>&nbsp; =
populate_io_cache_on_flush=3D'false' AND</div>


<div>&nbsp; compaction=3D{'class': 'SizeTieredCompactionStrategy'} =
AND</div><div>&nbsp; compression=3D{'sstable_compression': =
'SnappyCompressor'};</div><div><br></div><div>cqlsh:cql3usage&gt; select =
* from user;</div>


=
<div><br></div><div>&nbsp;user_id</div><div>---------</div><div>&nbsp;@mev=
ivs</div><div><br></div></div><div><br></div><div><br></div><div><br></div=
><div><br></div><div>I understand that, CQL3 and thrift interoperability =
is an issue. But this looks to me a very basic scenario.</div>


<div><br></div><div><br></div><div><br></div><div>Any suggestions? Or If =
anybody can explain a reason behind =
this?</div><div><br></div><div>-Vivek</div><div><br></div><div><br>
</div><div><br></div></div><div><br></div></div></div>
</blockquote></div><br></div></div></blockquote></div><br></div>
</div></blockquote></div><br><br clear=3D"all"><br></div><span><font =
color=3D"#888888">-- <br>Jonathan Ellis<br>Project Chair, Apache =
Cassandra<br>co-founder, <a href=3D"http://www.datastax.com/" =
target=3D"_blank">http://www.datastax.com</a><br>


@spyced<br>
</font></span></div>
</blockquote></div><br></div>
</div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</blockquote></div><br></body></html>=

--Apple-Mail=_10589AF4-E2DC-413C-ABC8-E10E2A7ADE66--