cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Baker <>
Subject Running hadoop jobs against data in remote data center
Date Thu, 07 Jul 2011 01:29:47 GMT
I'm just setting up a Cassandra cluster for my company.  For a variety of
reasons, we have the servers that run our hadoop jobs in our local office
and our production machines in a collocated data center.  We don't want to
run hadoop jobs against cassandra servers on the other side of the US from
us, not to mention that we don't want them impacting performance in
production.  What's the best way to handle this?

My first instinct is to add some servers locally to the node and use
NetworkTopologyStrategy.  This way, the servers automatically get updated
with the latest changes, and we get a bit of extra redundancy for our
production machine.  Of course, the glaring weakness of this strategy is
that our stats servers aren't in a datacenter with any kind of production
guarantees.  The network connection is relatively slow and unreliable, the
servers may go out at any time, and I generally don't want to tie our
production performance or reliability to these servers.

Is this as dumb an idea as I suspect it is, or can this be made to work?

Are there any better ways to accomplish what I'm trying to accomplish?

View raw message