cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
Date Mon, 22 Oct 2012 17:24:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481525#comment-13481525
] 

Jonathan Ellis commented on CASSANDRA-4815:
-------------------------------------------

bq. What does this mean? 'We can do that'. If we have an old style schema don't we need to
be able to alter a current table.

Only if you want to add meaningful names.  That's what this next part is saying:

"If we simply use the old schema directly as-is, Cassandra will give cell names and values
autogenerated CQL3 names: column1, column2, and so forth. Here I’m accessing the data inserted
earlier from CQL2, but with cqlsh --cql3:"

{noformat}
SELECT * FROM song_tags;

id                                   | column1 | value
--------------------------------------+---------+-------
8a172618-b121-4136-bb10-f665cfc469eb |    2007 |
8a172618-b121-4136-bb10-f665cfc469eb |  covers |
a3e64f8f-bd44-4f28-b8d9-6938726e34d4 |    1973 |
a3e64f8f-bd44-4f28-b8d9-6938726e34d4 |   blues |
{noformat}

... that said, as Sylvain points out we do have CASSANDRA-4822 open to allow changing those
default names without dropping and recreating the table definition.

bq. Does it make sense to implement CLI like SET and GET?

Not in CQL-the-language, and I don't think even in cqlsh-the-utility.  I understand the appeal
of the convenience, but the abstraction leakage it would introduce threatens to undo all the
work we're doing to make CQL3 something you can use on its own terms.

(As far as performance goes, prepared statements make the length of the string being parsed
initially a non-issue.)
                
> Make CQL work naturally with wide rows
> --------------------------------------
>
>                 Key: CASSANDRA-4815
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
>             Project: Cassandra
>          Issue Type: Wish
>            Reporter: Edward Capriolo
>
> I find that CQL3 is quite obtuse and does not provide me a language useful for accessing
my data. First, lets point out how we should design Cassandra data. 
> 1) Denormalize
> 2) Eliminate seeks
> 3) Design for read
> 4) optimize for blind writes
> So here is a schema that abides by these tried and tested rules large production uses
are employing today. 
> Say we have a table of movie objects:
> Movie
> Name
> Description
> -< tags   (string)
> -< credits composite(role string, name string )
> -1 likesToday
> -1 blacklisted
> The above structure is a movie notice it hold a mix of static and dynamic columns, but
the other all number of columns is not very large. (even if it was larger this is OK as well)
Notice this table is not just 
> a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many
data.
> The schema today is declared something like this:
> create column family movies
> with default_comparator=UTF8Type and
>   column_metadata =
>   [
>     {column_name: blacklisted, validation_class: int},
>     {column_name: likestoday, validation_class: long},
>     {column_name: description, validation_class: UTF8Type}
>   ];
> We should be able to insert data like this:
> set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
> set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
> set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf';
> set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob';
> set ['Cassandra Database, not looking for a seQL']['tags-action']='';
> set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
> set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
> set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
> This is the correct way to do it. 1 seek to find all the information related to a movie.
As long as this row does
> not get "large" there is no reason to optimize by breaking data into other column families.
(Notice you can not transpose this
> because movies is two 1-to-many relationships of potentially different types)
> Lets look at the CQL3 way to do this design:
> First, contrary to the original design of cassandra CQL does not like wide rows. It also
does not have a good way to dealing with dynamic rows together with static rows either.
> You have two options:
> Option 1: lose all schema
> create table movies ( name string, column blob, value blob, primary key(name)) with compact
storage.
> This method is not so hot we have not lost all our validators, and by the way you have
to physically shutdown everything and rename files and recreate your schema if you want to
inform cassandra that a current table should be compact. This could at very least be just
a metadata change. Also you can not add column schema either.
> Option 2  Normalize (is even worse)
> create table movie (name String, description string, likestoday int, blacklisted int);
> create table movecredits( name string, role string, personname string, primary key(name,role)
);
> create table movetags( name string, tag string, primary key (name,tag) );
> This is a terrible design, of the 4 key characteristics how cassandra data should be
designed it fails 3:
> It does not:
> 1) Denormalize
> 2) Eliminate seeks
> 3) Design for read
> Why is Cassandra steering toward this course, by making a language that does not understand
wide rows?
> So what can be done? My suggestions: 
> Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
> "virtual view" that is compact storage with no work to migrate data and recreate schemas.
Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/
should make this change.
> Metadata should be definable by regex. For example, all columnes named "tag*" are of
type string.
> CQL should have the column[slice_start] .. column[slice_end] operator from cql2. 
> CQL should support current users, users should not have to 
> switch between CQL versions, and possibly thrift, to work with wide rows. The language
should work for them even if 
> it not expressly designed for them. Some of these features are already part of cql2 so
they should be carried over.
> Also what needs to not happen is someone to make a hand waiving statement 
> like "Once we have collection types we will not need wide rows". This request is to satisfy
current users of cassandra not future ones or theoretical ones. Solutions should not involve
physically migrating data in any way, they should not involve telling someone to do something
they are already doing much differently. The suggestions should revolve around making the
query language work well with existing data. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message