I'm trying out DSE and looking for the best way to arrange the cluster. I have 9 nodes: 3 behind a gateway taking in writes from my collectors and 6 outside the gateway that are supposed to take replicas from the other 3 and serve reads and analytics jobs.
1. Is it ok to run the 3 nodes as normal Cassandra nodes and run the other 6 nodes as analytics? Can I serve both real time reads and M/R jobs from the 6 nodes? How will these affect each other performancewise?
I know that the way the system is supposed to be used is to separate analytics from real time queries. I've already explored a possible 3DC setup with Tyler in another message and it indeed works but I'm afraid it is too complex and would require me to send 2 replicas across the firewall which it can't handle very well at peak times, affecting other applications.
2. I started the cluster in the setup described in 1 (3 normal, 6 analytics) and as soon as the Analytics nodes start up they start outputting this message:
INFO [TASK-TRACKER-INIT] 2012-04-03 17:54:59,575 Client.java (line 629) Retrying connect to server: IP_OF_NORMAL_CASSANDRA_SEED_NODE:8012. Already tried 10 time(s).
So it seems my analytics nodes are trying to contact the normal Cassandra seed node on port 8012 which I read is a "Hadoop Job Tracker client port". It doesn't seem like this is the normal behavior. Why is it getting confused? In the .yaml of each node I'm using endpoint_snitch: com.datastax.bdp.snitch.DseSimpleSnitch and putting in the Analytics seed node before the normal cassandra seed node in the seeds.