Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: "Green, John M (HP Education)" <john.green@hp.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: Naive question about orphan rows
Thread-Topic: Naive question about orphan rows
Thread-Index: Ac8vGpPY63rtFOW4Qmaj1QpkLHZ+GgD39gGAAAJN57A=
Date: Wed, 26 Feb 2014 15:32:32 +0000
Message-ID: 
 <9FD9C2860A087F4398EB3FB01F19AA5F6934B2BF@G4W3292.americas.hpqcorp.net>
References: 
 <9FD9C2860A087F4398EB3FB01F19AA5F6933E40F@G9W0751.americas.hpqcorp.net>
 <CAENxBwyWducR9OF+sD+GksXzZuiuuQ7acvfXP_3NTYCPCqkcBg@mail.gmail.com>
In-Reply-To: 
 <CAENxBwyWducR9OF+sD+GksXzZuiuuQ7acvfXP_3NTYCPCqkcBg@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_9FD9C2860A087F4398EB3FB01F19AA5F6934B2BFG4W3292americas_"
MIME-Version: 1.0

--_000_9FD9C2860A087F4398EB3FB01F19AA5F6934B2BFG4W3292americas_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Edward,

Thanks for your insight.

One other thought I had was to store a reference count with the "song".  Wh=
en the last "playlist" referencing the "song" is deleted the "song" will al=
so be deleted because the reference count decrements to zero.   However, th=
is would create some nastiness when it comes to reliably maintaining refere=
nce counts.   I'm not sure if it would help to split the reference count in=
to two monotonically increasing counters (number of references added, and n=
umber of references deleted).

In my case, users cannot browse a repository of "songs" to build a playlist=
 from scratch.  They can only import "songs" themselves or create reference=
s to "songs" other users have explicitly made available to them.  Once a "s=
ong" is not referred to by any "playlist" it will never be re-discovered so=
 it should be deleted.   This could be done in some sort of background data=
 maintenance job that runs periodically.   Even if it is a low-priority bac=
kground job it look like it will create a lot overhead (scanning and produc=
ing counts).

John
From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Wednesday, February 26, 2014 5:56 AM
To: user@cassandra.apache.org
Subject: Re: Naive question about orphan rows

It is probably ok to have redundant songs in playlists, cassandra is about =
denormalization.

Dealing with this issue is going to be hard since the only way to dwal with=
 this would be scanning through the firsr cf and procing counts then using =
that information to delete in the second table. However that information ca=
n change rapidly and then will fall out of sink fast.

The only ways yo handle this are

1) never delete songs
2) store copies of songs ib playlist

On Friday, February 21, 2014, Green, John M (HP Education) <john.green@hp.c=
om<mailto:john.green@hp.com>> wrote:
> I'm very much a newbie so this may be a silly question but ...
>
>
>
> I have a situation similar to the music service example (http://www.datas=
tax.com/documentation/cql/3.1/cql/ddl/ddl_music_service_c.html) of songs an=
d playlists.  However, in my case, the "songs" would be considered orphans =
that should be deleted when no "playlists" refer to them.  Relational datab=
ases have mechanisms to manage this relationship so that a "song" could be =
deleted as soon as the last "playlist" referencing it is deleted.    While =
I do NOT need to manage this as an atomic transaction, I'm wondering what i=
s the best way to delete orphaned rows (i.e., "songs" not referenced by any=
 "playlists")  using Cassandra.
>
>
>
> I guess an alternative approach would be to store "songs" directly in the=
 "playlists" but this could lead to many redundant copies of the same "song=
" which is something I'm hoping to avoid.  I'm my case the "playlists" coul=
d have thousands of entries and the "songs" might be blobs of 10s of Mbytes=
.    Maybe I'm just having a hard time abandoning my relational roots?
>
>
>
> John

--
Sorry this was sent from mobile. Will do less grammar and spell check than =
usual.

--_000_9FD9C2860A087F4398EB3FB01F19AA5F6934B2BFG4W3292americas_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#44546A;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri","sans-serif";}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div class=3D"WordSection1">
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#44546A">Edward,<o:p></o:p></span>=
</p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#44546A"><br>
Thanks for your insight.&nbsp; <o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#44546A"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#44546A">One other thought I had w=
as to store a reference count with the &#8220;song&#8221;.&nbsp; When the l=
ast &#8220;playlist&#8221; referencing the &#8220;song&#8221; is deleted th=
e &#8220;song&#8221; will also be
 deleted because the reference count decrements to zero.&nbsp;&nbsp; Howeve=
r, this would create some nastiness when it comes to reliably maintaining r=
eference counts.&nbsp;&nbsp; I&#8217;m not sure if it would help to split t=
he reference count into two monotonically increasing counters
 (number of references added, and number of references deleted).&nbsp;&nbsp=
; <o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#44546A"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#44546A">In my case, users cannot =
browse a repository of &#8220;songs&#8221; to build a playlist from scratch=
.&nbsp; They can only import &#8220;songs&#8221; themselves or create refer=
ences to &#8220;songs&#8221;
 other users have explicitly made available to them.&nbsp; Once a &#8220;so=
ng&#8221; is not referred to by any &#8220;playlist&#8221; it will never be=
 re-discovered so it should be deleted.&nbsp;&nbsp; This could be done in s=
ome sort of background data maintenance job that runs periodically.&nbsp;&n=
bsp; Even
 if it is a low-priority background job it look like it will create a lot o=
verhead (scanning and producing counts).
<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#44546A"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#44546A">John<o:p></o:p></span></p=
>
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Edward C=
apriolo [mailto:edlinuxguru@gmail.com]
<br>
<b>Sent:</b> Wednesday, February 26, 2014 5:56 AM<br>
<b>To:</b> user@cassandra.apache.org<br>
<b>Subject:</b> Re: Naive question about orphan rows<o:p></o:p></span></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">It is probably ok to have redundant songs in playlis=
ts, cassandra is about denormalization.
<br>
<br>
Dealing with this issue is going to be hard since the only way to dwal with=
 this would be scanning through the firsr cf and procing counts then using =
that information to delete in the second table. However that information ca=
n change rapidly and then will fall
 out of sink fast.<br>
<br>
The only ways yo handle this are<br>
<br>
1) never delete songs<br>
2) store copies of songs ib playlist<br>
<br>
On Friday, February 21, 2014, Green, John M (HP Education) &lt;<a href=3D"m=
ailto:john.green@hp.com">john.green@hp.com</a>&gt; wrote:<br>
&gt; I&#8217;m very much a newbie so this may be a silly question but &#823=
0;<br>
&gt;<br>
&gt; &nbsp;<br>
&gt;<br>
&gt; I have a situation similar to the music service example (<a href=3D"ht=
tp://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_music_service_c.htm=
l">http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_music_service_=
c.html</a>) of songs and playlists.&nbsp;
 However, in my case, the &#8220;songs&#8221; would be considered orphans t=
hat should be deleted when no &#8220;playlists&#8221; refer to them.&nbsp; =
Relational databases have mechanisms to manage this relationship so that a =
&#8220;song&#8221; could be deleted as soon as the last &#8220;playlist&#82=
21; referencing
 it is deleted.&nbsp;&nbsp;&nbsp; While I do NOT need to manage this as an =
atomic transaction, I&#8217;m wondering what is the best way to delete orph=
aned rows (i.e., &#8220;songs&#8221; not referenced by any &#8220;playlists=
&#8221;) &nbsp;using Cassandra. &nbsp;&nbsp;&nbsp;<br>
&gt;<br>
&gt; &nbsp;<br>
&gt;<br>
&gt; I guess an alternative approach would be to store &#8220;songs&#8221; =
directly in the &#8220;playlists&#8221; but this could lead to many redunda=
nt copies of the same &#8220;song&#8221; which is something I&#8217;m hopin=
g to avoid.&nbsp; I&#8217;m my case the &#8220;playlists&#8221; could have =
thousands of entries and
 the &#8220;songs&#8221; might be blobs of 10s of Mbytes.&nbsp;&nbsp; &nbsp=
;Maybe I&#8217;m just having a hard time abandoning my relational roots?<br=
>
&gt;<br>
&gt; &nbsp;<br>
&gt;<br>
&gt; John<br>
<br>
-- <br>
Sorry this was sent from mobile. Will do less grammar and spell check than =
usual.<o:p></o:p></p>
</div>
</body>
</html>

--_000_9FD9C2860A087F4398EB3FB01F19AA5F6934B2BFG4W3292americas_--