Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BEEFD10F58 for ; Thu, 27 Jun 2013 09:36:47 +0000 (UTC) Received: (qmail 30070 invoked by uid 500); 27 Jun 2013 09:36:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 29778 invoked by uid 500); 27 Jun 2013 09:36:37 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 29758 invoked by uid 99); 27 Jun 2013 09:36:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jun 2013 09:36:36 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.138.229.66] (HELO nm33-vm2.bullet.mail.ne1.yahoo.com) (98.138.229.66) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jun 2013 09:36:28 +0000 Received: from [98.138.226.179] by nm33.bullet.mail.ne1.yahoo.com with NNFMP; 27 Jun 2013 09:36:06 -0000 Received: from [98.138.87.5] by tm14.bullet.mail.ne1.yahoo.com with NNFMP; 27 Jun 2013 09:36:06 -0000 Received: from [127.0.0.1] by omp1005.mail.ne1.yahoo.com with NNFMP; 27 Jun 2013 09:36:06 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 298824.10854.bm@omp1005.mail.ne1.yahoo.com Received: (qmail 53263 invoked by uid 60001); 27 Jun 2013 09:36:06 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1372325766; bh=CQFLZIu6nquL/54X4caQ3s8nHVt2BOaogANY8FiJLZA=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=vb9B3J03aqwo3sD2cNyydeZFZ7waptgYS8ALsSByCF21pKt9ObpakYlYnsq0044UmPLbo/pjWvIBDvhvrRh8MrXKOYGZ8pjf/XJl5nII1QaX3rTKpjR1HXfC5gT6M0mLfk1oqvODJ9nXWagJ8qy/jt+HvbPSs955P3K+EYXkT9s= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=5pHNI9rJ4xza6aKcXTFTfftNAY41IEbm/Px/dZh4qiF9tT9YVj5f6EhsGME5Fs5TsylCRDYOYSA7v56OgVSUDps9AJtDxS/Xtqy3a9Ftm0PEWCu2W9zlomsejkFlk9DuH8nS0mk03d0/7cIxHpafSrD/PHRO1wx4MqxdOigchzc=; X-YMail-OSG: r_GLCGMVM1kVwlDeoq0N57NtHxV6jdmLlB..eLJ2HlYYtma P0qMdSfL8vYVWIYBzlKKY8nVqORdKUfv3LGq8KWbeFv4VZm0T9ZLzs001aUp _VJSsWU4U8UD_OtEFMQ0KXsBPLmte3_ntg3n6xp65m3Ffsk9.E8Bst4JkQwE enneTAGzwWdHyM0x29HAfoUu38drS5O0hwK2s0FjRyRFrLGmhvb_j4uWqx_u KgkC.ybWJMocwC309LdbGVWdsCVE211PPTbwQBYwdN27R3hmE14DJVM_yDlu X.PtaRJ.32PudtLue4x7dEpW8r4chff6O5TKy6iRUjkycVk_b4xvpTMrdNxU Ti9Deml0kCxt.DvT4iGcMLNdMb8Tpl60tPqojm2dXHJKe2HGilNfT0NBbYAT Jy9vf6GNUd7o270vJXyER9bESnQGVZXzKfktPSrlCXLHyy2Ng1_q.kEe8voV ADKQybfIWLAmKQzan0RW5jAHs.OO_LT9rhghujJ0gVJyViijo_j9l2BpqhQL KlcZLTFooGtRB1D2CVjl4urGq0WPYnoV95c0kYOL1jFVguVXdrPoblQa0Izp 5odjaw50XxCi.XUH4XbUrK0dhJ0vF7CmgvTqh5xbeg2KuRZMJfsqjP7KalVe i8pqhjDswQPVemCNTMks4_A2DYshb9FUZJq3oZ_jtXyUj0YB.r6wpn45l4rM a66LaKONEjILw_BVHEYtoie1sB5iqFXUS Received: from [204.228.207.218] by web121805.mail.ne1.yahoo.com via HTTP; Thu, 27 Jun 2013 02:36:05 PDT X-Rocket-MIMEInfo: 002.001,VGhhbmtzIGZvciB0aGUgaW5mby4gVGhlcmUgYXJlIG90aGVyIHJlYXNvbnMgYW5kIHRoZSBzaXplIHlvdSBtZW50aW9uIGlzIHNtYWxsIGNvbXBhcmVkIHRvIG90aGVyIGRhdGEgSSBoYXZlIHdvcmtlZCB3aXRoLiBUaGUgc3BlZWQgYW5kIHNpemUgb2YgZGF0YSBhbmQgY29zdCBvZiBsaWNlbnNlIGhhdmUgdG8gYmUgdGFrZW4gaW50byBjb25zaWRlcmF0aW9uIHdoaWNoIEkgYW0gbG9va2luZyBhdC4gQWxzbyBkeW5hbWljIGNvbHVtbnMgaXMgb2YgaW50ZXJlc3QgdG8gbWUgYWxzby4KwqAKSSBhbSBqdXN0IHIBMAEBAQE- X-Mailer: YahooMailWebService/0.8.148.557 References: <1372264767.85242.YahooMailNeo@web121805.mail.ne1.yahoo.com> <1372267240.54860.YahooMailNeo@web121806.mail.ne1.yahoo.com> <1372269336.582.YahooMailNeo@web121805.mail.ne1.yahoo.com> <0A14ED78871B49B3B3805B67FE99142E@vig.local> <1372281703.13985.YahooMailNeo@web121805.mail.ne1.yahoo.com> <51CBB43C.7080402@aol.com> Message-ID: <1372325765.52199.YahooMailNeo@web121805.mail.ne1.yahoo.com> Date: Thu, 27 Jun 2013 02:36:05 -0700 (PDT) From: Tony Anecito Reply-To: Tony Anecito Subject: Re: Creating an "Index" column... To: "arthur.zubarev@aol.com" Cc: Robert Coli , Users-Cassandra In-Reply-To: <51CBB43C.7080402@aol.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-465872866-1153461777-1372325765=:52199" X-Virus-Checked: Checked by ClamAV on apache.org ---465872866-1153461777-1372325765=:52199 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Thanks for the info. There are other reasons and the size you mention is sm= all compared to other data I have worked with. The speed and size of data a= nd cost of license have to be taken into consideration which I am looking a= t. Also dynamic columns is of interest to me also.=0A=A0=0AI am just really= starting to understand it and I agree with you comments it just depends up= on your requirements.=0A=A0=0ARegards,=0A-Tony=0A=0AFrom: Arthur Zubarev =0ATo: Tony Anecito =0ACc: Robe= rt Coli ; Users-Cassandra = =0ASent: Wednesday, June 26, 2013 9:40 PM=0ASubject: Re: Creating an "Inde= x" column...=0A=0A=0A=0AAppreciate your thoughts Tony,=0A=0Ain our DW there= are composite keys, 500K of them say per customer to produce a report for = which the client program needs to page through the entire set collecting da= ta as it pages through yet to probably another desktop db. =0A=0AAt this po= int the purpose of having a NoSQL has been defeated.=0A=0AOn 06/26/2013 05:= 21 PM, Tony Anecito wrote:=0A=0AThanks Arthur.=0A>=0A>=0A>Interesting you t= hink NoSQL does not fit into large volumes of data, That is what it is tout= ed to do.=0A>I have heard PK's are needed but remember that is what the "ke= y" column is for I thought and composite key support is there also.=0A>=0A>= =0A>The only issue I see is the all that duplicate data and a need to keep = it in sync. So for example if the movie title "Superman" changed to "Superm= an the Man of Steel" you have to go change all those duplicate values. An e= asy problem to solve but the data modeler has to get past that. lol=0A>=0A>= =0A>Acid transactions is the other but I think then the supplier of info ha= s to think about that one.=0A>=0A>=0A>I have response times in my RDMS of s= everal hundred microseconds which is the really important requirement for m= e to keep that the same or better.=0A>=0A>=0A>Just some thoughts on the mat= ter.=0A>-Tony=0A>=0A>=0A>=0A>From: Arthur Zubarev mailto:Arthur.Zubarev@Aol= .com=0A>To: Tony Anecito mailto:adanecito@yahoo.com; Robert Coli mailto:rco= li@eventbrite.com; Users-Cassandra mailto:user@cassandra.apache.org =0A>Sen= t: Wednesday, June 26, 2013 3:08 PM=0A>Subject: Re: Creating an "Index" col= umn...=0A>=0A>=0A>=0A>Tony hi,=0A>=0A>Yes, in some scenarios (e.g. a DW), e= .g. absence of proper PKs or indexes (just too hard to envision, you need t= o think of future queries 1st) getting thru large volumes of data makes NoS= QL IMHO hard to fit in.=0A>=0A>But you have other choices:=0A>=0A>1) pagina= tion or=0A>2) slice queries.=0A>=0A>Both of that is covered here:=0A>=0A>ht= tp://pkghosh.wordpress.com/2012/03/04/cassandra-range-query-made-simple/=0A= >=0A>Hope that helps.=0A>=0A>/Arthur=0A>=0A>From: Tony Anecito =0A>Sent: We= dnesday, June 26, 2013 1:55 PM=0A>To: Robert Coli ; Users-Cassandra =0A>Sub= ject: Re: Creating an "Index" column...=0A>Hi Robert,=0A>=0A>Actually that = is what I did. I did that in my RDMS data model. In Cassandra or NOSQL with= out join or nested selects I have to do two queries. Also, since batching i= s not supported on the server side which makes the performance worse.=0A>= =0A>I just started learning Cassandra but I am learning fast and there are = some challenges when moving to a new data model driven by these factors.=0A= >=0A>Regards,=0A>-Tony=0A>=0A>=0A>=0A>=0A>From: Robert Coli mailto:rcoli@ev= entbrite.com=0A>To: user@cassandra.apache.org; Tony Anecito mailto:adanecit= o@yahoo.com =0A>Sent: Wednesday, June 26, 2013 11:32 AM=0A>Subject: Re: Cre= ating an "Index" column...=0A>=0A>=0A>On Wed, Jun 26, 2013 at 10:20 AM, Ton= y Anecito wrote:=0A>> Never mind I figured it out. I = found it via a search for Secondary indexes.=0A>=0A>In general unless you a= ctually need atomic update of the row and its=0A>secondary index, you are p= robably better off creating your own pseudo=0A>secondary index column famil= y.=0A>=0A>=3DRob=0A>=0A>=0A>=0A>=0A>=0A=0A=0A--=20=0D=0DRegards,=0D=0DArthu= r ---465872866-1153461777-1372325765=:52199 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks for the info. There are other reas= ons and the size you mention is small compared to other data I have worked = with. The speed and size of data and cost of license have to be taken into = consideration which I am looking at. Also dynamic columns is of interest to= me also.
 
I am just really sta= rting to understand it and I agree with you comments it just depends upon y= our requirements.
 
Regards,
-Tony

From: Arthur Zubarev <arthur.zubarev@a= ol.com>
To: Tony Anec= ito <adanecito@yahoo.com>
Cc= : Robert Coli <rcoli@eventbrite.com>; Users-Cassandra <= user@cassandra.apache.org>
Sent= : Wednesday, June 26, 2013 9:40 PM
Subject: Re: Creating an "Index" column...
=

Appreciate = your thoughts Tony,

in our DW there are composite keys, 500K of them= say per customer to produce a report for which the client program needs to= page through the entire set collecting data as it pages through yet to pro= bably another desktop db.

At this point the purpose of having a NoS= QL has been defeated.

On 06/26/2013 05:21 PM, Tony Anecito wrote:
Thanks Arthur.

Interesting you think NoSQL does not fit into large volumes of d= ata, That is what it is touted to do.
I have heard PK's are needed but remember that is what the "key"= column is for I thought and composite key support is there also.

The only issue I see is the all that duplicate data and a need t= o keep it in sync. So for example if the movie title "Superman" changed to = "Superman the Man of Steel" you have to go change all those duplicate value= s. An easy problem to solve but the data modeler has to get past that. lol<= /SPAN>

Acid transactions is the other but I think then the supplier of = info has to think about that one.

I have response times in my RDMS of several hundred microseconds= which is the really important requirement for me to keep that the same or = better.

Just some thoughts on the matter.
-Tony

From: Arthur Zubarev mailto:Arthu= r.Zubarev@Aol.com
To: Tony Anecito mailto:adanecito@yahoo.com; Robert Coli mailto:rcoli@eventbrite.com; U= sers-Cassandra mailto:user@cassandra.apache.org
Sent: Wednesday, June 26, 2013 3= :08 PM
Subject: Re: Crea= ting an "Index" column...

Tony hi,
 
Yes, in some scenarios (e.g. a DW), e.g. absence of proper PKs or inde= xes (just too hard to envision, you need to think of future queries 1st) ge= tting thru large volumes of data makes NoSQL IMHO hard to fit in.
 
But you have other choices:
 
1) pagination or
2) slice queries.
 
Both of that is covered here:
 
 
Hope that helps.
 
/Arthur
 
Sent: Wednesday, June 26, 2013 1:55 PM
Subject: Re: Creating an "Index" column...
 
Hi Robert,

Actually t= hat is what I did. I did that in my RDMS data model. In Cassandra or NOSQL = without join or nested selects I have to do two queries. Also, since batchi= ng is not supported on the server side which makes the performance worse.
I just started learning Cassandra but I am learning fast and there ar= e some challenges when moving to a new data model driven by these factors.<= BR>
Regards,
-Tony

 
From: Robert Coli mailto:rcoli@eventb= rite.com
To: user@cassandra.apache.org; Tony Anecito mailto:adanecito@yahoo.com
= Sent: Wednesday, June 26, 2= 013 11:32 AM
Subject: Re= : Creating an "Index" column...

On Wed, Jun 26, 2013 at 10:20= AM, Tony Anecito <adanecito@yahoo.com= > wrote:
> Never mind I figured it out. I found it via a searc= h for Secondary indexes.

In general unless you actually need atomic = update of the row and its
secondary index, you are probably better off c= reating your own pseudo
secondary index column family.

=3DRob
=





--=20

Regards,

Arthur


---465872866-1153461777-1372325765=:52199--