incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Evans <eev...@rackspace.com>
Subject Re: A question of 'referential integrity'...
Date Tue, 06 Apr 2010 14:26:53 GMT
On Tue, 2010-04-06 at 12:00 +0100, Steve wrote:
> First, I apologise sending this to the 'dev' mailing list - I couldn't
> find one for Cassandra users - and also for the basic nature of my
> questions...

user@cassandra.apache.org, (follow-ups there).

> I'm trying to get my head around the possibility of using Cassandra as
> the back-end to a project... and while, in most respects, Cassandra
> looks absolutely ideal... I'm finding it difficult to ascertain an
> appropriate strategy to ensure consistency (which would come 'for free'
> with a traditional, transactional, back end.)
> 
> As a sufficient abstraction of my storage requirements, imagine two
> (application oriented) universes of SHA-512 hashes - SRC and DST (each
> will, in practice, be tagged with other application data).  I need to
> support a remote API to manage a one-many mapping from SRC to DST, and a
> consistent (efficiently addressable) one-one mapping from DST to SRC.  I
> need to envisage millions of clients and tens of billions of mappings
> with billions of updates and lookups daily...
> 
> newAssoc(s:SRC,d:DST)
> listAssoc(s:SRC) => List<d:DST>
> delAssoc(d:DST)
> lookupAssoc(d:DST) => s:SRC
> 
> If I were using BDB, I'd have two maps - the first with s:SRC as key and
> d:DST as value - the second with (d:DST,s:SRC) as key with no values....
> and update these maps in a transaction.

You could model it like this with Cassandra as well.

It sounds like the real question though is, how can you structure this
to work given Cassandra's eventual consistency and record-level
atomicity?

For example, QUORUM consistency for reads and writes are enough to
ensure that your SRC->DST mappings remain unique and consistent from the
perspective of the client. And, if you can't make your application
resilient to inconsistencies in the inverted index (detect, repair,
etc), you could always use a Zookeeper-based mutex.

If you haven't already I'd suggest reading the Amazon whitepaper on
Dynamo[1] to understand eventual consistency, and Cassandra's API[2]
docs for how to apply it here.

[1]:
http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
[2]: http://wiki.apache.org/cassandra/API#ConsistencyLevel

> If I were in SQL land, I'd need a table a bit like this:
> 
> create table Assoc( src binary(64) , dst binary(64) unique, primary key
> (src,dst) )
> 
> The implementations of the API calls would be trivial insert, delete and
> select operations - and consistency between the primary key and the
> implicit (unique constraint) index would arise as a consequence of
> transactions.  I realise that, with Cassandra, I need a different
> approach - since I don't have the same notion of transactions on which
> to rely... and, in any case, given a desire for scalability, relying
> upon such fine grained transactions would definitely prove a
> bottleneck.  That said, the uniqueness of DST values is systemically
> critical - so, even while I do not anticipate duplicate hashes in
> practice, I need uniqueness to be verified - and for the second SRC
> values asking to associate with an existing DST value to fail without
> violating the one->one mapping from DST to SRC... and for this failure
> to be notified ASAP.
> 
> It strikes me that a plausible design might be one that writes a log of
> 'insert/delete' with pairs of hashes which some background process
> eventually indexes in a big batch... before clearing the transaction
> log.  If this is "The Cassandra Way" - I'm surprised not to have found
> any examples... am I looking in the wrong places for them?  Is my log of
> 'insert' and 'delete' operations something I'd implement myself using
> ad-hoc techniques, or is there explicit support for this in Cassandra? 
> Do I need to develop my own process (from scratch) to merge updates with
> on-disk data - or is there a neat way to get Cassandra to do that for me?
> 
> Another issue I'm considering is if I should map from SRC to a list of
> DST as my low-level representation with Cassandra... or should I map
> individually.  A potential problem is that one SRC value can map to
> arbitrarily many DST values.   At the level of the RPC API, I can
> address this by returning an object resembling a scrollable cursor
> instead of a static list - but, I guess, I'd need to be concerned about
> resource limitations (memory, etc.) for the on-disk representation?  I
> presume that there's a significant advantage to storing the one-to-many
> map explicitly (locality of reference, for example) - as well as
> minimising the size of the encoded data... I'm guessing that there is no
> prefix-compression for keys?  Key compression would likely lead to the
> opposite architectural decisions from a resource-use perspective... and
> would eliminate concerns about maps from single SRC values to very large
> numbers of DST values.
> 
> Any hints, tips, comments, pointers to relevant documentation etc. would
> be much appreciated...  I'm guessing many others have tackled a problem
> similar to this?
> 


-- 
Eric Evans
eevans@rackspace.com


Mime
View raw message