cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
Date Fri, 19 Aug 2011 15:36:28 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087765#comment-13087765
] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 8/19/11 3:36 PM:
--------------------------------------------------------------------

I need to back up.  Neither transposed() nor transposed(column) gives us the power we need,
unless we layer subqueries on top.  Let me use a more complete example to illustrate why.

Suppose we have a Twissandra data model where the timeline storage is done "properly" in supercolumns.
 Then we might want to say, "give me the most recent 50 tweets in user X's timeline:"

{code}
SELECT transposed(*) FROM timeline WHERE key=X LIMIT 50
{code}

So far so good.  (Note again how this obviates the need for the existing, cumbersome "column
limit" syntax for ".." slices.)

But what if we just want, say, the username and timestamp from each entry?

{code}
SELECT transposed(username, timestamp) FROM timeline WHERE key=X LIMIT 50
{code}

That's where I was going with the multiple parameters to transposed().

Now, for the WHERE clause components... I guess for metastore-less Hive that's ok.  Or even,
as you originally suggested, just leave it out and say "you can only do full-CF scans without
a C* metastore."  I'm okay with that.

But for CQL we need to be able to give meaningful names to things, e.g., a row full of supercolumns
representing tweets could be "tweet_id" instead of "supercolumn."  So it would be nice to
support this somehow in "full metastore Hive."

This is important because we want to generalize beyond the current "one level of nesting"
you get with supercolumns, to arbitrary with composite columns.  So,

{code}
SELECT transposed(tweet_id, -- supercolumn
                  username, timestamp) -- subcolumns
FROM timeline 
WHERE user_id='jbellis' -- key
LIMIT 50
{code}

Another example.  Suppose I have a "retweets" CF, where I have a composite columns of (tweet_id,
retweet_id).  (You can think of this as simply supercolumn/subcolumn, the difference being
that composites won't deserialize the entire parent container to fetch a single subcolumn
or subcolumn slice.)  And I want the most recent retweets, for a given tweet:

{code}
SELECT transposed(retweet_id) -- second-level column name / subcolumn
FROM retweets 
WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name / supercolumn
  AND user_id='jbellis' -- key
LIMIT 50
{code}


      was (Author: jbellis):
    I need to back up.  Neither transposed() nor transposed(column) gives us the power we
need, unless we layer subqueries on top.  Let me use a more complete example to illustrate
why.

Suppose we have a Twissandra data model where the timeline storage is done "properly" in supercolumns.
 Then we might want to say, "give me the most recent 50 tweets in user X's timeline:"

{code}
SELECT transposed(*) FROM timeline WHERE key=X LIMIT 50
{code}

So far so good.  (Note again how this obviates the need for the existing, cumbersome "column
limit" syntax for ".." slices.)

But what if we just want, say, the username and timestamp from each entry?

{code}
SELECT transposed(username, timestamp) FROM timeline WHERE key=X LIMIT 50
{code}

That's where I was going with the multiple parameters to transposed().

Now, for the WHERE clause components... I guess for metastore-less Hive that's ok.  Or even,
as you originally suggested, just leave it out and say "you can only do full-CF scans without
a C* metastore."  I'm okay with that.

But for CQL we need to be able to give meaningful names to things, e.g., a row full of supercolumns
representing tweets could be "tweet_id" instead of "supercolumn."  So it would be nice to
support this somehow in "full metastore Hive."

This is important because we want to generalize beyond the current "one level of nesting"
you get with supercolumns, to arbitrary with composite columns.  So,

{code}
SELECT transposed(tweet_id, -- supercolumn
                  username, timestamp) -- subcolumns
FROM timeline 
WHERE user_id='jbellis' -- key
LIMIT 50
{code}

Another example.  Suppose I have a "retweets" CF, where I have a composite columns of (tweet_id,
retweet_id).  (You can think of this as simply supercolumn/subcolumn, the difference being
that composites won't deserialize the entire parent container to fetch a single subcolumn
or subcolumn slice.)  And I want the most recent retweets, for a given tweet:

{code}
SELECT transposed(retweet_id) -- second-level column name / subcolumn
FROM timeline 
WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name / supercolumn
  AND user_id='jbellis' -- key
LIMIT 50
{code}

  
> CQL support for compound columns
> --------------------------------
>
>                 Key: CASSANDRA-2474
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: API, Core
>            Reporter: Eric Evans
>              Labels: cql
>             Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of compound column
names (the CQL syntax is colon-delimted terms), and then teaching the decoders (drivers) to
create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message