Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DDC009E73 for ; Tue, 10 Jul 2012 14:19:31 +0000 (UTC) Received: (qmail 60769 invoked by uid 500); 10 Jul 2012 14:19:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 60748 invoked by uid 500); 10 Jul 2012 14:19:29 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 60740 invoked by uid 99); 10 Jul 2012 14:19:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jul 2012 14:19:29 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of carlos.carrasco@groupalia.com does not designate 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vc0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jul 2012 14:19:25 +0000 Received: by vcqp1 with SMTP id p1so13463vcq.31 for ; Tue, 10 Jul 2012 07:19:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=NOs96c1g92hOY5hmlC3nbcmbioBeQBqbDBpBS9PKIaI=; b=WpCH8BXjNp+Rba9eaAH/c9Dn1VkH7debSP30qgve/zgRgnUEQ8T59CotWlNLpSf+uA L97OfGEFnKnGCaF87a8UQmt2BPkS+6mKshvr+HBjn6y1tMzlwP/qj1uHP7cswLsy/gC+ NvLzPK0REt+RcvFxQ72pqlta482TdD5TW7zBNg7bPeb14AmwSbWIdjIQBQi8NGR0HTVx XwOPbMlPPRRfRGXUF2xusqb7x5v0XoGDIyq8r4IO24xYytnQM/YRLQfjN7HwFjVsaTTO J2bEtQzPaMlc2RALKw8zSGliteqBMtb8vM8VApnzS6lW2URc5oGU2jdcGG7muTC95sC1 AIGA== MIME-Version: 1.0 Received: by 10.220.152.138 with SMTP id g10mr21077607vcw.14.1341929943423; Tue, 10 Jul 2012 07:19:03 -0700 (PDT) Received: by 10.52.31.129 with HTTP; Tue, 10 Jul 2012 07:19:03 -0700 (PDT) In-Reply-To: References: <41A0E175-CC7F-49CE-8D1C-5B6624777D19@thelastpickle.com> <68406753-8A12-4F7F-875D-1CB767F1C8CA@thelastpickle.com> <08AB4D4C-0569-4B51-A319-377D2280825C@gmail.com> Date: Tue, 10 Jul 2012 16:19:03 +0200 Message-ID: Subject: Re: Dynamic CF From: Carlos Carrasco To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d043be0d6975c3204c47a69ad X-Gm-Message-State: ALoCoQm0dkqTMmQlxSrchEQ9eghlpjP4qPbj1iiBylxzoGqaLU5JMS70BrA2XOvM9zjwIoT8fnhU X-Virus-Checked: Checked by ClamAV on apache.org --f46d043be0d6975c3204c47a69ad Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I think he means something like having a fixed set of coiumns in the table definition, then in the actual rows having other columns not specified in the defintion, indepentent of the composited part of the PK. When I reviewed CQL3 for using in Gossie[1] I realized I couldn't have this, and that it would complicate things like migrations or optional columns. For this reason I didn't use CQL3 and instead wrote a row unmaper that detects the discontinuities in the composited part and uses those as the boundaries for the individual concrete rows stored in a wide row [2]. For example: Given a Timeline table defined as key validation UTF8Type, column name validation CompositeType(LongType, AsciiType), value validation BytesType: Timeline: { user1: { 1341933021000000: { Author: "Tom", Body: "Hey!" }, 1341933022000000: { Author: "Paul", Body: "Nice", Lat: 40.0, Lon: 20.0 }, 1341933023000000: { Author: "Lana", Body: "Cool" } }, ... } Both of the following structs are valid and will be able to be unmaped from the wide row "user1": type Tweet struct { UserID string `cf:"Timeline" key:"UserID" cols:"When"` When int64 Author string Body string } type GeoTweet struct { UserID string `cf:"Timeline" key:"UserID" cols:"When"` When int64 Author string Body string Lat float32 Lon float32 } Granted I lose database-side validation over the individual column values (BytesType) but in exchange I get very flexible rows and much nicer behaviour for model changes and migrations. 1: https://github.com/carloscm/gossie 2: https://github.com/carloscm/gossie/blob/master/src/gossie/mapping.go#L33= 9 On 10 July 2012 14:23, Sylvain Lebresne wrote: > On Fri, Jul 6, 2012 at 10:49 PM, Leonid Ilyevsky > wrote: > > At this point I am really confused about what direction Cassandra is > going. CQL 3 has the benefit of composite keys, but no dynamic columns. > > I thought, the whole point of Cassandra was to provide dynamic tables. > > CQL3 absolutely provide "dynamic tables"/wide rows, the syntax is just > different. The typical example for wide rows is a time serie, for > instance keeping all the events for a given event_kind in the same C* > row ordered by time. You declare that in CQL3 using: > CREATE TABLE events ( > event_kind text, > time timestamp, > event_name text, > event_details text, > PRIMARY KEY (event_kind, time) > ) > > The important part in such definition is that one CQL row (i.e a given > event_kind, time, event_name, even_details) does not map to an internal > Cassandra row. More precisely, all events sharing the same event_kind wil= l > be > in the same internal row. This is a wide row/dynamic table in the sense o= f > thrift. > > > > I need to have a huge table to store market quotes, and be able to quer= y > it by name and timestamp (t1 <=3D t <=3D t2), therefore I wanted the comp= osite > key. > > Loading data to such table using prepared statements (CQL 3-based) was > very slow, because it makes a server call for each row. > > You should use a BATCH statement which is the equivalent to batch_mutate. > > -- > Sylvain > --=20 Carlos CarrascoIT - Software Architect Llull, 95-97, 2=BA planta, 08005 BarcelonaSkype: carlos.carrasco.groupalia www.groupalia.comcarlos.carrasco@groupalia.com --f46d043be0d6975c3204c47a69ad Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I think he means something like having a fixed set of coiumns in the table = definition, then in the actual rows having other columns not specified in t= he defintion, indepentent of the composited part of the PK. When I reviewed= CQL3 for using in Gossie[1] I realized I couldn't have this, and that = it would complicate things like migrations or optional columns. For this re= ason I didn't use CQL3 and instead wrote a row unmaper that detects the= discontinuities in the composited part and uses those as the boundaries fo= r the individual concrete rows stored in a wide row [2]. For example:

Given a Timeline table defined as key validation UTF8Ty= pe, column name validation CompositeType(LongType, AsciiType), value valida= tion BytesType:

Timeline: {
=A0 =A0 user= 1: {
=A0 =A0 =A0 =A0 1341933021000000: {
=A0 =A0 =A0 =A0 =A0 =A0 = Author: "Tom",
=A0 =A0 =A0 =A0 =A0 =A0 Body: "Hey!= "
=A0 =A0 =A0 =A0 },
=A0 =A0 =A0 =A0 1341933022000= 000: {
=A0 =A0 =A0 =A0 =A0 =A0 Author: "Paul",
=A0 =A0 =A0 =A0 =A0 =A0 Body: "Nice",
=A0 =A0 =A0 = =A0 =A0 =A0 Lat: 40.0,
=A0 =A0 =A0 =A0 =A0 =A0 Lon: 20.0
=A0 =A0 =A0 =A0 },
=A0 =A0 =A0 =A0 1341933023000000: {
=A0 =A0 =A0 =A0 =A0 =A0 Author: "Lana",
=A0 =A0 =A0 =A0 =A0 =A0 Body: "Cool"
=A0 =A0 =A0 =A0 }<= /div>
=A0 =A0 },
=A0 =A0 ...
}

=
Both of the following structs are valid and will be able to be unmaped= from the wide row "user1":

type Tweet struct {
=A0 =A0 UserID =A0string = `cf:"Timeline" key:"UserID" cols:"When"`
=A0 =A0 When =A0 =A0int64
=A0 =A0 Author =A0string
=A0 =A0 Body =A0 =A0string
}

type GeoTweet struct {
=A0 =A0 Us= erID =A0string `cf:"Timeline" key:"UserID" cols:"W= hen"`
=A0 =A0 When =A0 =A0int64
=A0 =A0 Author =A0= string
=A0 =A0 Body =A0 =A0string
=A0 =A0 Lat =A0 =A0 float32
= =A0 =A0 Lon =A0 =A0 float32
}

Granted I = lose database-side validation over the individual column values (BytesType)= but in exchange I get very flexible rows and much nicer behaviour for mode= l changes and migrations.


On 10 July 2012 14:23, Sylvain Lebresne= <sylvain@datastax.com> wrote:
CQL3 absolutely provide "dynamic tables"/wide rows, the syn= tax is just
different. The typical example for wide rows is a time serie, for
instance keeping all the events for a given event_kind in the same C*
row ordered by time. You declare that in CQL3 using:
=A0 CREATE TABLE events (
=A0 =A0 event_kind text,
=A0 =A0 time timestamp,
=A0 =A0 event_name text,
=A0 =A0 event_details text,
=A0 =A0 PRIMARY KEY (event_kind, time)
=A0 )

The important part in such definition is that one CQL row (i.e a given
event_kind, time, event_name, even_details) does not map to an internal
Cassandra row. More precisely, all events sharing the same event_kind will = be
in the same internal row. This is a wide row/dynamic table in the sense of<= br> thrift.


> I need to have a huge table to store market quotes, and be able to que= ry it by name and timestamp (t1 <=3D t <=3D t2), therefore I wanted t= he composite key.
> Loading data to such table using prepared statements (CQL 3-based) was= very slow, because it makes a server call for each row.

You should use a BATCH statement which is the equivalent to batch_mut= ate.

--
Sylvain



--
Llull, 95-97, 2=BA planta, 08005 Ba= rcelona
Carlos Carrasco
IT - Software Architect

Skype: carl= os.carrasco.groupalia
www.groupalia.com
carlos.carrasco@groupalia.com

--f46d043be0d6975c3204c47a69ad--