Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of yiming.sun@gmail.com designates
 74.125.82.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADVHTB_dBbrG=AhY_5u9dS0qadwSrqakUTbUoR9ZE4=QoQ9-Pg@mail.gmail.com>
References: 
 <CABxBLH9ZKiZShB+7U_mM+74ZWwSp84=K9Fa8KTG1krpXJ9TC2w@mail.gmail.com>
 <CAJciDs0fbK6gr5fb2o=rUkkXrx5SYAtLQk6PHDBd0Fkut5ae3w@mail.gmail.com>
 <6BC865B0-908C-4689-9D1F-82979DC55C61@mindspring.com>
 <CABxBLH_252r=Ob56Byd2A6iXaxzgYGxb+tA_b7o5U1H1XvrjVQ@mail.gmail.com>
 <CAKYY9AJKG-hP2F0cEaBnDZYw11pG-ngsA1FMhRPhgFvLbe_HFQ@mail.gmail.com>
 <CABxBLH8X0jXAJ4FF6E-S7Uy1EGedoVNs+T=o7aTxcFjNos4bgw@mail.gmail.com>
 <CADVHTB_dBbrG=AhY_5u9dS0qadwSrqakUTbUoR9ZE4=QoQ9-Pg@mail.gmail.com>
From: Yiming Sun <yiming.sun@gmail.com>
Date: Sun, 5 Feb 2012 10:05:29 -0500
Message-ID: 
 <CABxBLH-E7BL+hwnqTtrK5OReqLyOdex9bwdskwNW=E4favvipw@mail.gmail.com>
Subject: Re: yet a couple more questions on composite columns
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0016e6d6476295f16404b838e1fa

--0016e6d6476295f16404b838e1fa
Content-Type: text/plain; charset=ISO-8859-1

Thanks R.V.!! We are also dealing with many small files, so this sounds
really promising.

-- Y.

On Sun, Feb 5, 2012 at 9:59 AM, R. Verlangen <robin@us2.nl> wrote:

> Yiming, I am using 2 CF's. Performance wise this should not be an issue. I
> use it for small files data store. My 2 CF's are:
>
> FilesMeta
> FilesData
>
>
> 2012/2/5 Yiming Sun <yiming.sun@gmail.com>
>
>> Interesting idea, Jim.  Is there a reason you don't you use
>> "metadata:{accountId}" instead?  For performance reasons?
>>
>>
>> On Sat, Feb 4, 2012 at 6:24 PM, Jim Ancona <jim@anconafamily.com> wrote:
>>
>>> I've used "special" values which still comply with the Composite
>>> schema for the metadata columns, e.g. a column of
>>> 1970-01-01:{accountId} for a metadata column where the Composite is
>>> DateType:UTF8Type.
>>>
>>> Jim
>>>
>>> On Sat, Feb 4, 2012 at 2:13 PM, Yiming Sun <yiming.sun@gmail.com> wrote:
>>> > Thanks Andrey and Chris.  It sounds like we don't necessarily have to
>>> use
>>> > composite columns.  From what I understand about dynamic CF, each row
>>> may
>>> > have completely different data from other rows;  but in our case, the
>>> data
>>> > in each row is similar to other rows; my concern was more about the
>>> > homogeneity of the data between columns.
>>> >
>>> > In our original supercolumn-based schema, one special supercolumn is
>>> called
>>> > "metadata" which contains a number of subcolumns to hold metadata
>>> describing
>>> > each collection (e.g. number of documents, etc.), then the rest of the
>>> > supercolumns in the same row are all IDs of documents belong to the
>>> > collection, and for each document supercolumn, the subcolumns contain
>>> the
>>> > document content as well as metadata on individual document (e.g.
>>> checksum
>>> > of each document).
>>> >
>>> > To move away from the supercolumn schema, I could either create two
>>> CFs, one
>>> > to hold metadata, the other document content; or I could create just
>>> one CF
>>> > mixing metadata and doc content in the same row, and using composite
>>> column
>>> > names to identify if the particular column is metadata or a document.
>>>  I am
>>> > just wondering if you have any inputs on the pros and cons of each
>>> schema.
>>> >
>>> > -- Y.
>>> >
>>> >
>>> > On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken <
>>> chrisgerken@mindspring.com>
>>> > wrote:
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On 4 February 2012 06:21, Yiming Sun <yiming.sun@gmail.com> wrote:
>>> >>>
>>> >>> I cannot have one composite column name with 3 components while
>>> another
>>> >>> with 4 components?
>>> >>
>>> >>  Just put 4 components and left last empty (if it is same type)?!
>>> >>
>>> >>> Another question I have is how flexible composite columns actually
>>> are.
>>> >>>  If my data model has a CF containing US zip codes with the following
>>> >>> composite columns:
>>> >>>
>>> >>> {OH:Spring Field} : 45503
>>> >>> {OH:Columbus} : 43085
>>> >>> {FL:Spring Field} : 32401
>>> >>> {FL:Key West}  : 33040
>>> >>>
>>> >>> I know I can ask cassandra to "give me the zip codes of all cities in
>>> >>> OH".  But can I ask it to "give me the zip codes of all cities named
>>> Spring
>>> >>> Field" using this model?  Thanks.
>>> >>
>>> >> No. You set first composite component at first.
>>> >>
>>> >>
>>> >> I'd use a dynamic CF:
>>> >> row key = state abbreviation
>>> >> column name = city name
>>> >> column value = zip code (or a complex object, one of whose properties
>>> is
>>> >> zip code)
>>> >>
>>> >> you can iterate over the columns in a single row to get a state's city
>>> >> names and their zip code and you can do a get_range_slices on all
>>> keys for
>>> >> the columns starting and ending on the city name to find out the zip
>>> codes
>>> >> for a cities with the given name.
>>> >>
>>> >> I think
>>> >>
>>> >> - Chris
>>> >
>>> >
>>>
>>
>>
>

--0016e6d6476295f16404b838e1fa
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks R.V.!! We are also dealing with many small files, so this sounds rea=
lly promising.<div><br></div><div>-- Y.<br><br><div class=3D"gmail_quote">O=
n Sun, Feb 5, 2012 at 9:59 AM, R. Verlangen <span dir=3D"ltr">&lt;<a href=
=3D"mailto:robin@us2.nl">robin@us2.nl</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Yiming, I am using 2 CF&#39;s. Performance w=
ise this should not be an issue. I use it for small files data store. My 2 =
CF&#39;s are:<div>

<br><div>FilesMeta</div><div>FilesData<div><div class=3D"h5"><br><br><div c=
lass=3D"gmail_quote">2012/2/5 Yiming Sun <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:yiming.sun@gmail.com" target=3D"_blank">yiming.sun@gmail.com</a>&gt;<=
/span><br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Interesting idea, Jim. =A0Is there a reason =
you don&#39;t you use &quot;metadata:{accountId}&quot; instead? =A0For perf=
ormance reasons? =A0<div>


<div><br><br><div class=3D"gmail_quote">On Sat, Feb 4, 2012 at 6:24 PM, Jim=
 Ancona <span dir=3D"ltr">&lt;<a href=3D"mailto:jim@anconafamily.com" targe=
t=3D"_blank">jim@anconafamily.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">I&#39;ve used &quot;special&quot; values whi=
ch still comply with the Composite<br>
schema for the metadata columns, e.g. a column of<br>
1970-01-01:{accountId} for a metadata column where the Composite is<br>
DateType:UTF8Type.<br>
<span><font color=3D"#888888"><br>
Jim<br>
</font></span><div><div><br>
On Sat, Feb 4, 2012 at 2:13 PM, Yiming Sun &lt;<a href=3D"mailto:yiming.sun=
@gmail.com" target=3D"_blank">yiming.sun@gmail.com</a>&gt; wrote:<br>
&gt; Thanks Andrey and Chris. =A0It sounds like we don&#39;t necessarily ha=
ve to use<br>
&gt; composite columns. =A0From what I understand about dynamic CF, each ro=
w may<br>
&gt; have completely different data from other rows; =A0but in our case, th=
e data<br>
&gt; in each row is similar to other rows; my concern was more about the<br=
>
&gt; homogeneity of the data between columns.<br>
&gt;<br>
&gt; In our original supercolumn-based schema, one special supercolumn is c=
alled<br>
&gt; &quot;metadata&quot; which contains a number of subcolumns to hold met=
adata describing<br>
&gt; each collection (e.g. number of documents, etc.), then the rest of the=
<br>
&gt; supercolumns in the same row are all IDs of documents belong to the<br=
>
&gt; collection, and for each document supercolumn, the subcolumns contain =
the<br>
&gt; document content as well as metadata on individual document (e.g. chec=
ksum<br>
&gt; of each document).<br>
&gt;<br>
&gt; To move away from the supercolumn schema, I could either create two CF=
s, one<br>
&gt; to hold metadata, the other document content; or I could create just o=
ne CF<br>
&gt; mixing metadata and doc content in the same row, and using composite c=
olumn<br>
&gt; names to identify if the particular column is metadata or a document. =
=A0I am<br>
&gt; just wondering if you have any inputs on the pros and cons of each sch=
ema.<br>
&gt;<br>
&gt; -- Y.<br>
&gt;<br>
&gt;<br>
&gt; On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken &lt;<a href=3D"mailto:ch=
risgerken@mindspring.com" target=3D"_blank">chrisgerken@mindspring.com</a>&=
gt;<br>
&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On 4 February 2012 06:21, Yiming Sun &lt;<a href=3D"mailto:yiming.=
sun@gmail.com" target=3D"_blank">yiming.sun@gmail.com</a>&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I cannot have one composite column name with 3 components whil=
e another<br>
&gt;&gt;&gt; with 4 components?<br>
&gt;&gt;<br>
&gt;&gt; =A0Just put 4 components and left last empty (if it is same type)?=
!<br>
&gt;&gt;<br>
&gt;&gt;&gt; Another question I have is how flexible composite columns actu=
ally are.<br>
&gt;&gt;&gt; =A0If my data model has a CF containing US zip codes with the =
following<br>
&gt;&gt;&gt; composite columns:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; {OH:Spring Field} : 45503<br>
&gt;&gt;&gt; {OH:Columbus} : 43085<br>
&gt;&gt;&gt; {FL:Spring Field} : 32401<br>
&gt;&gt;&gt; {FL:Key West} =A0: 33040<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I know I can ask cassandra to &quot;give me the zip codes of a=
ll cities in<br>
&gt;&gt;&gt; OH&quot;. =A0But can I ask it to &quot;give me the zip codes o=
f all cities named Spring<br>
&gt;&gt;&gt; Field&quot; using this model? =A0Thanks.<br>
&gt;&gt;<br>
&gt;&gt; No. You set first composite component at first.<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; I&#39;d use a dynamic CF:<br>
&gt;&gt; row key =3D state abbreviation<br>
&gt;&gt; column name =3D city name<br>
&gt;&gt; column value =3D zip code (or a complex object, one of whose prope=
rties is<br>
&gt;&gt; zip code)<br>
&gt;&gt;<br>
&gt;&gt; you can iterate over the columns in a single row to get a state=
9;s city<br>
&gt;&gt; names and their zip code and you can do a get_range_slices on all =
keys for<br>
&gt;&gt; the columns starting and ending on the city name to find out the z=
ip codes<br>
&gt;&gt; for a cities with the given name.<br>
&gt;&gt;<br>
&gt;&gt; I think<br>
&gt;&gt;<br>
&gt;&gt; - Chris<br>
&gt;<br>
&gt;<br>
</div></div></blockquote></div><br>
</div></div></blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>

--0016e6d6476295f16404b838e1fa--