This may be hard because the coordinator could store hinted handoff (HH) data on disk. You could turn HH off and have RF=1 to keep data on a single instance, but you would be likely to lose data if you had any problems with your instances… Also you would need to tweak the memtable flushing so that it goes to disk more often than the ten seconds which is the default. Or lose data. You will also have an "interesting" time scaling your cluster and would have to plan for that in your custom database.
Essentially you want to turn off all the features which make Cassandra a robust product ;-). Without knowing your requirements more precisely, I'd be inclined to recommend manually sharding on MariaDB or Postgres instances instead, or use their underlying storage engines directly (e.g. InnoDB), if you're just looking for a key-value store.
Ahoy the list. I am evaluating Cassandra in the context of using it as a storage back end for the Titan graph database.
We’ll have several nodes in the cluster. However, one of our requirements is that data has to be loaded into and stored on a specific node and only on that node. Also, it cannot be replicated around the system, at least not stored persistently on disk – we will of course make copies in memory and on the wire as we access remote notes. These requirements are non-negotiable.
We understand that this is essentially the opposite of what Cassandra is designed for, and that we’re missing all the scalability and robustness, but is it technically possible?
First, I would need to create a custom partitioner – is there any tutorial on that? I see a few “you don’t need” to threads, but I do.
Second, how easy is it to have Cassandra not replicate data between nodes in a cluster? I’m not seeing an obvious configuration option for that, presumably because it obviates much of the point of using Cassandra, but again, we’re working within some rather unfortunate constraints.
Any hints or suggestions would be most gratefully received.