cassandra-client-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul "LeoNerd" Evans <>
Subject Multi-node cluster-aware client connection
Date Wed, 11 Sep 2013 16:20:16 GMT
Having got the first stage of my client connector module nicely working
to a single node, I'm now looking at how to make it cluster-aware,
maintaining multiple connections for reliability and load-spreading.
What are some good strategies to take here?

My current plan involves connecting to a (randomly chosen from a list?)
seed node, to query the list of peers in the cluster, then make a
selection of some number of those to be "primary" nodes, and some more
as "backup" nodes. The primary nodes will be used to spread actual
query load around, the backups sitting idle simply as a fast way to
failover to some known-working connection if a primary falls over. By
registering an interest in topology and status change messages, the
client can keep the list of available nodes up-to-date.

 1. What is a good way to handle prepared statements here? Should they
    be prepared on all the (primary/all?) nodes, or just one? Some
    applications I could imagine having just a handful of heavily-used
    prepared statements, so they'd become a hotspot on one node if it
    wasn't spread around. But then what to do as new nodes become
    elected as primaries? Should they be prepared eagerly on
    connection? Lazily at next use?

 2. Secondly; what are suggested ways to actually spread load among the
    primaries? I could imagine a simple round-robin, or something more
    fancy involving picking the node with the fewest outstanding
    requests, or the one on which we've been responsible for the least
    processing time recently, or something else... Do client libraries
    generally provide a selection of these mechanisms, or just pick one?

Paul "LeoNerd" Evans
ICQ# 4135350       |  Registered Linux# 179460

View raw message