incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Naive question about orphan rows
Date Wed, 26 Feb 2014 16:17:35 GMT
Right the problem with building a list of counts in a batch is what happens
if song added as you are building the counts.


On Wed, Feb 26, 2014 at 10:32 AM, Green, John M (HP Education) <
john.green@hp.com> wrote:

>  Edward,
>
>
> Thanks for your insight.
>
>
>
> One other thought I had was to store a reference count with the "song".
> When the last "playlist" referencing the "song" is deleted the "song" will
> also be deleted because the reference count decrements to zero.   However,
> this would create some nastiness when it comes to reliably maintaining
> reference counts.   I'm not sure if it would help to split the reference
> count into two monotonically increasing counters (number of references
> added, and number of references deleted).
>
>
>
> In my case, users cannot browse a repository of "songs" to build a
> playlist from scratch.  They can only import "songs" themselves or create
> references to "songs" other users have explicitly made available to them.
> Once a "song" is not referred to by any "playlist" it will never be
> re-discovered so it should be deleted.   This could be done in some sort of
> background data maintenance job that runs periodically.   Even if it is a
> low-priority background job it look like it will create a lot overhead
> (scanning and producing counts).
>
>
>
> John
>
> *From:* Edward Capriolo [mailto:edlinuxguru@gmail.com]
> *Sent:* Wednesday, February 26, 2014 5:56 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Naive question about orphan rows
>
>
>
> It is probably ok to have redundant songs in playlists, cassandra is about
> denormalization.
>
> Dealing with this issue is going to be hard since the only way to dwal
> with this would be scanning through the firsr cf and procing counts then
> using that information to delete in the second table. However that
> information can change rapidly and then will fall out of sink fast.
>
> The only ways yo handle this are
>
> 1) never delete songs
> 2) store copies of songs ib playlist
>
> On Friday, February 21, 2014, Green, John M (HP Education) <
> john.green@hp.com> wrote:
> > I'm very much a newbie so this may be a silly question but ...
> >
> >
> >
> > I have a situation similar to the music service example (
> http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_music_service_c.html)
> of songs and playlists.  However, in my case, the "songs" would be
> considered orphans that should be deleted when no "playlists" refer to
> them.  Relational databases have mechanisms to manage this relationship so
> that a "song" could be deleted as soon as the last "playlist" referencing
> it is deleted.    While I do NOT need to manage this as an atomic
> transaction, I'm wondering what is the best way to delete orphaned rows
> (i.e., "songs" not referenced by any "playlists")  using Cassandra.
> >
> >
> >
> > I guess an alternative approach would be to store "songs" directly in
> the "playlists" but this could lead to many redundant copies of the same
> "song" which is something I'm hoping to avoid.  I'm my case the "playlists"
> could have thousands of entries and the "songs" might be blobs of 10s of
> Mbytes.    Maybe I'm just having a hard time abandoning my relational roots?
> >
> >
> >
> > John
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Mime
View raw message