incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radim Kolar <...@sendmail.cz>
Subject Re: multi datacenter cluster, without fibre speeds
Date Mon, 14 Nov 2011 09:43:27 GMT

> Well to be honest I was thinking of using that connection in 
> production, not for a backup node.
>
For productions. there are several problems. Added network latency which 
is inconsistent and vary greatly during day, sometimes you will face 
network lags which will break cluster for a while (about 1-2 minutes). 
Also network bandwidth is problem especially during peak hours. It might 
not be problem if you dont have interactive workload - app can wait, 
human cant. Be sure to use connection pooling to different servers at 
client. Over WAN you can have about 4:1 ratio in available bw in peak 
hours/night hours. - You need to schedule antientropy repairs at nights.
>
> My Cassandra deployment works just like an expensive file caching and 
> replication - I mean, all I use it for is to replicate some 5million 
> files of 2M each across few nodes and intensively read/write.
>
for mass replication of large files hadoop is really better then 
cassandra because there are no compactions.
>
> Not only the files themselves but I also need to attach some tags to 
> each file (see them as key=value) so I though of Haadop but in the end 
> settle for Cassandra because of better consistency, community support, 
> no single point of failure and some!
>
hadoop is far better then cassandra for batch processing if your batch 
processing changes majority of data set. SPOF is not problem, but it is 
way harder to write optimised applications for hadoop, its kinda low level.

Mime
View raw message