incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremiah Jordan <jeremiah.jor...@morningstar.com>
Subject Re: unidirectional communication/replication
Date Wed, 29 Feb 2012 17:42:23 GMT
You might check out some of the stuff Netflix does with their Cassandra 
backup, and Cassandra ETL tools.:
http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html
http://techblog.netflix.com/2012/02/announcing-priam.html


-Jeremiah

On 02/29/2012 11:04 AM, Alexandru Sicoe wrote:
>
>
>
> On Sun, Feb 26, 2012 at 8:24 PM, aaron morton <aaron@thelastpickle.com 
> <mailto:aaron@thelastpickle.com>> wrote:
>
>     All nodes in the cluster need two way communication. Nodes need to
>     talk to Gossip to each other so they know they are alive.
>
>     If you need to dump a lot of data consider the Hadoop integration.
>     http://wiki.apache.org/cassandra/HadoopSupport It can run a bit
>     faster than going through the thrift api.
>
>
> Thanks for the suggestion, I will look into it.
>
>
>     Copying sstables may be another option depending on the data size.
>
>
> The problem with this is that the SSTable, from what I understand, is 
> per CF, Since I will want to do a semi real time replication of just 
> the latest data added this won't work because I will be copying over 
> all the data in the CF.
>
> Cheers,
> A
>
>
>     Cheers
>
>
>     -----------------
>     Aaron Morton
>     Freelance Developer
>     @aaronmorton
>     http://www.thelastpickle.com
>
>     On 25/02/2012, at 3:21 AM, Alexandru Sicoe wrote:
>
>>     Hello everyone,
>>
>>     I'm battling with this contraint that I have: I need to regularly
>>     ship out timeseries data from a Cassandra cluster that sits
>>     within an enclosed network, outside of the network.
>>
>>     I tried to select all the data within a certian time window,
>>     writing to a file, and then copying the file out but this hits
>>     the I/O performance because even for a small time window (say
>>     5mins) I am hitting more than a million rows.
>>
>>     It would really help if I used Cassandra to replicate the data
>>     automatically outside. The problem is they will only allow me to
>>     have outbound traffic out of the enclosed network (not inbound).
>>     Is there any way to configure the cluster or have 2 data centers
>>     in such a way that the data center (node or cluster) outside of
>>     the enclosed network only gets a replica of the data, without
>>     ever needing to communicate anything back?
>>
>>     I appreciate the help,
>>     Alex
>
>

Mime
View raw message