cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: Running hadoop jobs against data in remote data center
Date Thu, 07 Jul 2011 02:23:54 GMT
See http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers and http://www.datastax.com/docs/0.8/brisk/about_brisk#about-the-brisk-architecture

It's possible to run multi DC and use LOCAL_QUORUM consistency level in your production centre
to allow the prod code to get on with it's life without worrying about the other DC.

Hope that helps.


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7/07/2011, at 1:29 PM, Jason Baker <jason@apture.com> wrote:

> I'm just setting up a Cassandra cluster for my company.  For a variety of reasons, we
have the servers that run our hadoop jobs in our local office and our production machines
in a collocated data center.  We don't want to run hadoop jobs against cassandra servers on
the other side of the US from us, not to mention that we don't want them impacting performance
in production.  What's the best way to handle this?
> 
> My first instinct is to add some servers locally to the node and use NetworkTopologyStrategy.
 This way, the servers automatically get updated with the latest changes, and we get a bit
of extra redundancy for our production machine.  Of course, the glaring weakness of this strategy
is that our stats servers aren't in a datacenter with any kind of production guarantees. 
The network connection is relatively slow and unreliable, the servers may go out at any time,
and I generally don't want to tie our production performance or reliability to these servers.
> 
> Is this as dumb an idea as I suspect it is, or can this be made to work?  :-)
> 
> Are there any better ways to accomplish what I'm trying to accomplish?

Mime
View raw message