Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 80DA19D1F for ; Sun, 5 Feb 2012 15:06:19 +0000 (UTC) Received: (qmail 3149 invoked by uid 500); 5 Feb 2012 15:06:17 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 3095 invoked by uid 500); 5 Feb 2012 15:06:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 3087 invoked by uid 99); 5 Feb 2012 15:06:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Feb 2012 15:06:15 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_FILL_THIS_FORM_SHORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yiming.sun@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-we0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Feb 2012 15:06:10 +0000 Received: by werm10 with SMTP id m10so4588795wer.31 for ; Sun, 05 Feb 2012 07:05:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=EuKVdOnPYr869a6JwtMxM0sflxZ6eMx9MeferIMhqDg=; b=sr4aB3OigZCIWuwzRz065+cm4r0gxnFSxnCujbCuNZa7PSpN8utBBNKsJvMdueAknN 9TEbhLqXR+ssN9U4cz63rUyvsxq5CFitE+425u0QUWchU4CTfBrKW5cKeVR7BWfP2MRM UfuW0MLzEO9axDvZps7Ep+KbLcPrAv5tX9e9Q= Received: by 10.216.138.86 with SMTP id z64mr2044939wei.31.1328454349222; Sun, 05 Feb 2012 07:05:49 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.137.170 with HTTP; Sun, 5 Feb 2012 07:05:29 -0800 (PST) In-Reply-To: References: <6BC865B0-908C-4689-9D1F-82979DC55C61@mindspring.com> From: Yiming Sun Date: Sun, 5 Feb 2012 10:05:29 -0500 Message-ID: Subject: Re: yet a couple more questions on composite columns To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6d6476295f16404b838e1fa X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d6476295f16404b838e1fa Content-Type: text/plain; charset=ISO-8859-1 Thanks R.V.!! We are also dealing with many small files, so this sounds really promising. -- Y. On Sun, Feb 5, 2012 at 9:59 AM, R. Verlangen wrote: > Yiming, I am using 2 CF's. Performance wise this should not be an issue. I > use it for small files data store. My 2 CF's are: > > FilesMeta > FilesData > > > 2012/2/5 Yiming Sun > >> Interesting idea, Jim. Is there a reason you don't you use >> "metadata:{accountId}" instead? For performance reasons? >> >> >> On Sat, Feb 4, 2012 at 6:24 PM, Jim Ancona wrote: >> >>> I've used "special" values which still comply with the Composite >>> schema for the metadata columns, e.g. a column of >>> 1970-01-01:{accountId} for a metadata column where the Composite is >>> DateType:UTF8Type. >>> >>> Jim >>> >>> On Sat, Feb 4, 2012 at 2:13 PM, Yiming Sun wrote: >>> > Thanks Andrey and Chris. It sounds like we don't necessarily have to >>> use >>> > composite columns. From what I understand about dynamic CF, each row >>> may >>> > have completely different data from other rows; but in our case, the >>> data >>> > in each row is similar to other rows; my concern was more about the >>> > homogeneity of the data between columns. >>> > >>> > In our original supercolumn-based schema, one special supercolumn is >>> called >>> > "metadata" which contains a number of subcolumns to hold metadata >>> describing >>> > each collection (e.g. number of documents, etc.), then the rest of the >>> > supercolumns in the same row are all IDs of documents belong to the >>> > collection, and for each document supercolumn, the subcolumns contain >>> the >>> > document content as well as metadata on individual document (e.g. >>> checksum >>> > of each document). >>> > >>> > To move away from the supercolumn schema, I could either create two >>> CFs, one >>> > to hold metadata, the other document content; or I could create just >>> one CF >>> > mixing metadata and doc content in the same row, and using composite >>> column >>> > names to identify if the particular column is metadata or a document. >>> I am >>> > just wondering if you have any inputs on the pros and cons of each >>> schema. >>> > >>> > -- Y. >>> > >>> > >>> > On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken < >>> chrisgerken@mindspring.com> >>> > wrote: >>> >> >>> >> >>> >> >>> >> >>> >> On 4 February 2012 06:21, Yiming Sun wrote: >>> >>> >>> >>> I cannot have one composite column name with 3 components while >>> another >>> >>> with 4 components? >>> >> >>> >> Just put 4 components and left last empty (if it is same type)?! >>> >> >>> >>> Another question I have is how flexible composite columns actually >>> are. >>> >>> If my data model has a CF containing US zip codes with the following >>> >>> composite columns: >>> >>> >>> >>> {OH:Spring Field} : 45503 >>> >>> {OH:Columbus} : 43085 >>> >>> {FL:Spring Field} : 32401 >>> >>> {FL:Key West} : 33040 >>> >>> >>> >>> I know I can ask cassandra to "give me the zip codes of all cities in >>> >>> OH". But can I ask it to "give me the zip codes of all cities named >>> Spring >>> >>> Field" using this model? Thanks. >>> >> >>> >> No. You set first composite component at first. >>> >> >>> >> >>> >> I'd use a dynamic CF: >>> >> row key = state abbreviation >>> >> column name = city name >>> >> column value = zip code (or a complex object, one of whose properties >>> is >>> >> zip code) >>> >> >>> >> you can iterate over the columns in a single row to get a state's city >>> >> names and their zip code and you can do a get_range_slices on all >>> keys for >>> >> the columns starting and ending on the city name to find out the zip >>> codes >>> >> for a cities with the given name. >>> >> >>> >> I think >>> >> >>> >> - Chris >>> > >>> > >>> >> >> > --0016e6d6476295f16404b838e1fa Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks R.V.!! We are also dealing with many small files, so this sounds rea= lly promising.

-- Y.

O= n Sun, Feb 5, 2012 at 9:59 AM, R. Verlangen <robin@us2.nl> wrote:
Yiming, I am using 2 CF's. Performance w= ise this should not be an issue. I use it for small files data store. My 2 = CF's are:

FilesMeta
FilesData


2012/2/5 Yiming Sun <yiming.sun@gmail.com><= /span>
Interesting idea, Jim. =A0Is there a reason = you don't you use "metadata:{accountId}" instead? =A0For perf= ormance reasons? =A0


On Sat, Feb 4, 2012 at 6:24 PM, Jim= Ancona <jim@anconafamily.com> wrote:
I've used "special" values whi= ch still comply with the Composite
schema for the metadata columns, e.g. a column of
1970-01-01:{accountId} for a metadata column where the Composite is
DateType:UTF8Type.

Jim

On Sat, Feb 4, 2012 at 2:13 PM, Yiming Sun <yiming.sun@gmail.com> wrote:
> Thanks Andrey and Chris. =A0It sounds like we don't necessarily ha= ve to use
> composite columns. =A0From what I understand about dynamic CF, each ro= w may
> have completely different data from other rows; =A0but in our case, th= e data
> in each row is similar to other rows; my concern was more about the > homogeneity of the data between columns.
>
> In our original supercolumn-based schema, one special supercolumn is c= alled
> "metadata" which contains a number of subcolumns to hold met= adata describing
> each collection (e.g. number of documents, etc.), then the rest of the=
> supercolumns in the same row are all IDs of documents belong to the > collection, and for each document supercolumn, the subcolumns contain = the
> document content as well as metadata on individual document (e.g. chec= ksum
> of each document).
>
> To move away from the supercolumn schema, I could either create two CF= s, one
> to hold metadata, the other document content; or I could create just o= ne CF
> mixing metadata and doc content in the same row, and using composite c= olumn
> names to identify if the particular column is metadata or a document. = =A0I am
> just wondering if you have any inputs on the pros and cons of each sch= ema.
>
> -- Y.
>
>
> On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken <chrisgerken@mindspring.com&= gt;
> wrote:
>>
>>
>>
>>
>> On 4 February 2012 06:21, Yiming Sun <yiming.sun@gmail.com> wrote:
>>>
>>> I cannot have one composite column name with 3 components whil= e another
>>> with 4 components?
>>
>> =A0Just put 4 components and left last empty (if it is same type)?= !
>>
>>> Another question I have is how flexible composite columns actu= ally are.
>>> =A0If my data model has a CF containing US zip codes with the = following
>>> composite columns:
>>>
>>> {OH:Spring Field} : 45503
>>> {OH:Columbus} : 43085
>>> {FL:Spring Field} : 32401
>>> {FL:Key West} =A0: 33040
>>>
>>> I know I can ask cassandra to "give me the zip codes of a= ll cities in
>>> OH". =A0But can I ask it to "give me the zip codes o= f all cities named Spring
>>> Field" using this model? =A0Thanks.
>>
>> No. You set first composite component at first.
>>
>>
>> I'd use a dynamic CF:
>> row key =3D state abbreviation
>> column name =3D city name
>> column value =3D zip code (or a complex object, one of whose prope= rties is
>> zip code)
>>
>> you can iterate over the columns in a single row to get a state= 9;s city
>> names and their zip code and you can do a get_range_slices on all = keys for
>> the columns starting and ending on the city name to find out the z= ip codes
>> for a cities with the given name.
>>
>> I think
>>
>> - Chris
>
>



--0016e6d6476295f16404b838e1fa--