On 06/04/2010 18:50, Benjamin Black wrote:
I'm finding this exchange very confusing. What exactly about
Cassandra 'looks absolutely ideal' to you for your project? The write
performance, the symmetric, peer to peer architecture, etc?
Reasons I like Cassandra for this project:
A shorter answer might be that, in all ways other than depending upon
'referential integrity' between two 'maps' of hash-values, the data for
the rest of my application looks remarkably like that of large sites
that we know already use Cassandra.
- Columnar rather than tabular data structures with an extensible
'schemata' - permitting evolution of back-end data structures to
support new features without down-time.
- Decentralised architecture with fault tolerance/redundancy
permitting high availability on shoestring budget hardware in an easily
scalable pool - in spite of needing to track rapidly changing data that
precludes meaningful backup.
- Easy to establish that data will be efficiently sharded -
allowing many concurrent reads and writes - i.e. systemic IO bandwidth
is scalable - both for reading and writing.
- Lightweight, free and open-source physical data model that
minimises risk of vendor lock-in or insurmountable problems with
glitches in commercial closed-source libraries.
I'm trying to establish the most effective Cassandra approach to
achieve the logical 'referential integrity' while minimising resource
(memory/disk/CPU) use in order to minimise hardware costs for any given
deployment scale - all the while, retaining the above advantages.