Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 97CB59DDB for ; Sun, 5 Feb 2012 01:55:23 +0000 (UTC) Received: (qmail 36389 invoked by uid 500); 5 Feb 2012 01:55:21 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 36327 invoked by uid 500); 5 Feb 2012 01:55:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 36319 invoked by uid 99); 5 Feb 2012 01:55:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Feb 2012 01:55:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_FILL_THIS_FORM_SHORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yiming.sun@gmail.com designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Feb 2012 01:55:15 +0000 Received: by wgbdt10 with SMTP id dt10so4039543wgb.25 for ; Sat, 04 Feb 2012 17:54:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=MNnpRIBK3Zb2BbiG0Q9tHSEoKsDvVmoM3xeG2c+YTxo=; b=EMbS4X+89Lv7hrNeVQwbAIdIaLfdFJsEuRerj+ZHUAzHqyHcJ7FSd7JIfIm3szEt9q 5gP2EjBWO54hqiHPmEKj0S8CsRLPTiYRzMj+/xbQ2CtjY/xMYxB2E5maETJgsuMWItHV Fx9zvZuXe9sjaLcXkovaC9/zkN+GgXJ5l1Y/w= Received: by 10.180.103.68 with SMTP id fu4mr16686160wib.7.1328406894316; Sat, 04 Feb 2012 17:54:54 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.137.170 with HTTP; Sat, 4 Feb 2012 17:54:34 -0800 (PST) In-Reply-To: References: <6BC865B0-908C-4689-9D1F-82979DC55C61@mindspring.com> From: Yiming Sun Date: Sat, 4 Feb 2012 20:54:34 -0500 Message-ID: Subject: Re: yet a couple more questions on composite columns To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d044282520d8e8804b82dd566 --f46d044282520d8e8804b82dd566 Content-Type: text/plain; charset=ISO-8859-1 Interesting idea, Jim. Is there a reason you don't you use "metadata:{accountId}" instead? For performance reasons? On Sat, Feb 4, 2012 at 6:24 PM, Jim Ancona wrote: > I've used "special" values which still comply with the Composite > schema for the metadata columns, e.g. a column of > 1970-01-01:{accountId} for a metadata column where the Composite is > DateType:UTF8Type. > > Jim > > On Sat, Feb 4, 2012 at 2:13 PM, Yiming Sun wrote: > > Thanks Andrey and Chris. It sounds like we don't necessarily have to use > > composite columns. From what I understand about dynamic CF, each row may > > have completely different data from other rows; but in our case, the > data > > in each row is similar to other rows; my concern was more about the > > homogeneity of the data between columns. > > > > In our original supercolumn-based schema, one special supercolumn is > called > > "metadata" which contains a number of subcolumns to hold metadata > describing > > each collection (e.g. number of documents, etc.), then the rest of the > > supercolumns in the same row are all IDs of documents belong to the > > collection, and for each document supercolumn, the subcolumns contain the > > document content as well as metadata on individual document (e.g. > checksum > > of each document). > > > > To move away from the supercolumn schema, I could either create two CFs, > one > > to hold metadata, the other document content; or I could create just one > CF > > mixing metadata and doc content in the same row, and using composite > column > > names to identify if the particular column is metadata or a document. I > am > > just wondering if you have any inputs on the pros and cons of each > schema. > > > > -- Y. > > > > > > On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken < > chrisgerken@mindspring.com> > > wrote: > >> > >> > >> > >> > >> On 4 February 2012 06:21, Yiming Sun wrote: > >>> > >>> I cannot have one composite column name with 3 components while another > >>> with 4 components? > >> > >> Just put 4 components and left last empty (if it is same type)?! > >> > >>> Another question I have is how flexible composite columns actually are. > >>> If my data model has a CF containing US zip codes with the following > >>> composite columns: > >>> > >>> {OH:Spring Field} : 45503 > >>> {OH:Columbus} : 43085 > >>> {FL:Spring Field} : 32401 > >>> {FL:Key West} : 33040 > >>> > >>> I know I can ask cassandra to "give me the zip codes of all cities in > >>> OH". But can I ask it to "give me the zip codes of all cities named > Spring > >>> Field" using this model? Thanks. > >> > >> No. You set first composite component at first. > >> > >> > >> I'd use a dynamic CF: > >> row key = state abbreviation > >> column name = city name > >> column value = zip code (or a complex object, one of whose properties is > >> zip code) > >> > >> you can iterate over the columns in a single row to get a state's city > >> names and their zip code and you can do a get_range_slices on all keys > for > >> the columns starting and ending on the city name to find out the zip > codes > >> for a cities with the given name. > >> > >> I think > >> > >> - Chris > > > > > --f46d044282520d8e8804b82dd566 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Interesting idea, Jim. =A0Is there a reason you don't you use "met= adata:{accountId}" instead? =A0For performance reasons? =A0

On Sat, Feb 4, 2012 at 6:24 PM, Jim Ancona <jim@anconafamily.com<= /a>> wrote:
I've used "special" values whi= ch still comply with the Composite
schema for the metadata columns, e.g. a column of
1970-01-01:{accountId} for a metadata column where the Composite is
DateType:UTF8Type.

Jim

On Sat, Feb 4, 2012 at 2:13 PM, Yiming Sun <
yiming.sun@gmail.com> wrote:
> Thanks Andrey and Chris. =A0It sounds like we don't necessarily ha= ve to use
> composite columns. =A0From what I understand about dynamic CF, each ro= w may
> have completely different data from other rows; =A0but in our case, th= e data
> in each row is similar to other rows; my concern was more about the > homogeneity of the data between columns.
>
> In our original supercolumn-based schema, one special supercolumn is c= alled
> "metadata" which contains a number of subcolumns to hold met= adata describing
> each collection (e.g. number of documents, etc.), then the rest of the=
> supercolumns in the same row are all IDs of documents belong to the > collection, and for each document supercolumn, the subcolumns contain = the
> document content as well as metadata on individual document (e.g. chec= ksum
> of each document).
>
> To move away from the supercolumn schema, I could either create two CF= s, one
> to hold metadata, the other document content; or I could create just o= ne CF
> mixing metadata and doc content in the same row, and using composite c= olumn
> names to identify if the particular column is metadata or a document. = =A0I am
> just wondering if you have any inputs on the pros and cons of each sch= ema.
>
> -- Y.
>
>
> On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken <chrisgerken@mindspring.com>
> wrote:
>>
>>
>>
>>
>> On 4 February 2012 06:21, Yiming Sun <yiming.sun@gmail.com> wrote:
>>>
>>> I cannot have one composite column name with 3 components whil= e another
>>> with 4 components?
>>
>> =A0Just put 4 components and left last empty (if it is same type)?= !
>>
>>> Another question I have is how flexible composite columns actu= ally are.
>>> =A0If my data model has a CF containing US zip codes with the = following
>>> composite columns:
>>>
>>> {OH:Spring Field} : 45503
>>> {OH:Columbus} : 43085
>>> {FL:Spring Field} : 32401
>>> {FL:Key West} =A0: 33040
>>>
>>> I know I can ask cassandra to "give me the zip codes of a= ll cities in
>>> OH". =A0But can I ask it to "give me the zip codes o= f all cities named Spring
>>> Field" using this model? =A0Thanks.
>>
>> No. You set first composite component at first.
>>
>>
>> I'd use a dynamic CF:
>> row key =3D state abbreviation
>> column name =3D city name
>> column value =3D zip code (or a complex object, one of whose prope= rties is
>> zip code)
>>
>> you can iterate over the columns in a single row to get a state= 9;s city
>> names and their zip code and you can do a get_range_slices on all = keys for
>> the columns starting and ending on the city name to find out the z= ip codes
>> for a cities with the given name.
>>
>> I think
>>
>> - Chris
>
>

--f46d044282520d8e8804b82dd566--