On Sun, Feb 26, 2012 at 8:24 PM, aaron morton <aaron@thelastpickle.com> wrote:
All nodes in the cluster need two way communication. Nodes need to talk to Gossip to each other so they know they are alive. 

If you need to dump a lot of data consider the Hadoop integration. http://wiki.apache.org/cassandra/HadoopSupport It can run a bit faster than going through the thrift api.

Thanks for the suggestion, I will look into it.

Copying sstables may be another option depending on the data size. 

The problem with this is that the SSTable, from what I understand, is per CF, Since I will want to do a semi real time replication of just the latest data added this won't work because I will be copying over all the data in the CF.



Aaron Morton
Freelance Developer

On 25/02/2012, at 3:21 AM, Alexandru Sicoe wrote:

Hello everyone,

I'm battling with this contraint that I have: I need to regularly ship out timeseries data from a Cassandra cluster that sits within an enclosed network, outside of the network.

I tried to select all the data within a certian time window, writing to a file, and then copying the file out but this hits the I/O performance because even for a small time window (say 5mins) I am hitting more than a million rows.

It would really help if I used Cassandra to replicate the data automatically outside. The problem is they will only allow me to have outbound traffic out of the enclosed network (not inbound). Is there any way to configure the cluster or have 2 data centers in such a way that the data center (node or cluster) outside of the enclosed network only gets a replica of the data, without ever needing to communicate anything back?

I appreciate the help,