Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Mon, 4 Jun 2012 16:23:23 +0000 (UTC)
From: "Ahmet AKYOL (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <95010820.34544.1338827003141.JavaMail.jiratomcat@issues-vm>
In-Reply-To: 
 <850714914.8702.1334899056766.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Comment Edited] (CASSANDRA-4176) Support for sharding wide
 rows in CQL 3.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288495#comment-13288495 ] 

Ahmet AKYOL edited comment on CASSANDRA-4176 at 6/4/12 4:22 PM:
----------------------------------------------------------------

+1 on 'separation of concerns'. It has nothing to do with composite keys.

On the other hand, as a C* user (a freeloader, not a talented committer like you guys ),  I do not like the sound of any user side (like CQL 3) solution for sharding wide rows; because, users will have to think about "sharding" for most of their CFs.

The problem here is the row size again as in [CASSANDRA-3929|https://issues.apache.org/jira/browse/CASSANDRA-3929] and IMHO, the same solution (compaction strategy but maybe with some extras like chaining) can be used here. I'm sure, it'll make things more complicated on Cassandra side. However, it's better for users.

In fact, the real problem is the "very very wide rows" as mentioned. Partition by hash (or any other automatic way) may cause efficiency problems for the crowd ("not very wide rows") due to unnecessary sharding.

So, please attack the "very very wide rows" problem and if possible, find a configurable (like "wide row sharding size hint: 10 MB") solution without CQL 3.

P.S.: I'll also admit that, "automated sharding for time series" can be good enough for some use cases but not all of them. So, this issue still makes sense but not as "sharding very wide rows" but as "automated sharding for time series".
                
      was (Author: liqusha):
    +1 on 'separation of concerns'. It has nothing to do with composite keys.

On the other hand, as a C* user (a freeloader, not a talented committer like you guys ),  I do not like the sound of any user side (like CQL 3) solution for sharding wide rows; because, I have to think about "sharding" for most of my CFs.

The problem here is the row size again as in [CASSANDRA-3929|https://issues.apache.org/jira/browse/CASSANDRA-3929] and IMHO, the same solution (compaction strategy but maybe with some extras like chaining) can be used here. I'm sure, it'll make things more complicated on Cassandra side. However, it's better for users.

In fact, the real problem is some "very very wide rows" as mentioned. Partition by hash (or any other automatic way) may cause efficiency problems for the crowd ("not very wide rows") due to unnecessary sharding.

So, please attack the "very very wide rows" problem and if possible, find a configurable (like "wide row sharding size hint: 10 MB") solution without CQL 3.

P.S.: I'll also admit that, "automated sharding for time series" can be good enough for some use cases but not all of them. So, this issue still makes sense but not as "sharding very wide rows" but as "automated sharding for time series".
                  
> Support for sharding wide rows in CQL 3.0
> -----------------------------------------
>
>                 Key: CASSANDRA-4176
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4176
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: API
>            Reporter: Nick Bailey
>             Fix For: 1.1.2
>
>
> CQL 3.0 currently has support for defining wide rows by declaring a composite primary key. For example:
> {noformat}
> CREATE TABLE timeline (
>     user_id varchar,
>     tweet_id uuid,
>     author varchar,
>     body varchar,
>     PRIMARY KEY (user_id, tweet_id)
> );
> {noformat}
> It would also be useful to manage sharding a wide row through the cql schema. This would require being able to split up the actual row key in the schema definition. In the above example you might want to make the row key a combination of user_id and day_of_tweet, in order to shard timelines by day. This might look something like:
> {noformat}
> CREATE TABLE timeline (
>     user_id varchar,
>     day_of_tweet date,
>     tweet_id uuid,
>     author varchar,
>     body varchar,
>     PRIMARY KEY (user_id REQUIRED, day_of_tweet REQUIRED, tweet_id)
> );
> {noformat}
> Thats probably a terrible attempt at how to structure that in CQL. But I think I've gotten the point across. I tagged this for cql 3.0, but I'm honestly not sure how much work it might be. As far as I know built in support for composite keys is limited.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira