Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 170EA113AA for ; Fri, 27 Jun 2014 16:05:25 +0000 (UTC) Received: (qmail 68845 invoked by uid 500); 27 Jun 2014 16:05:24 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 68809 invoked by uid 500); 27 Jun 2014 16:05:24 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 68795 invoked by uid 99); 27 Jun 2014 16:05:24 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jun 2014 16:05:24 +0000 Date: Fri, 27 Jun 2014 16:05:24 +0000 (UTC) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-7453) Geo-replication in Cassandra MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046097#comment-14046097 ] Jonathan Ellis commented on CASSANDRA-7453: ------------------------------------------- Replicas must be uniquely determined by the partition key and *only* the partition key. Everything breaks if two different machines compute different replica sets. Using an additional table might work, but at best you'll have really terrible performance. So variants on (1) are the only sane option. > Geo-replication in Cassandra > ---------------------------- > > Key: CASSANDRA-7453 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7453 > Project: Cassandra > Issue Type: Wish > Reporter: Sergio Esteves > Priority: Minor > > Currently, a Cassandra cluster spanned across different datacenters replicates all data to all datacenters when an update is performed. This is a problem for the scalability of Cassandra as the number of datacenters increases. > It would be desirable to have some way to make Cassandra aware of the location of data requests so that it could place replicas close to users and avoid replicating to remote datacenters that are far away. > To this end, we thought of implementing a new replication strategy and some possible solutions to achieve our goals are: > 1) Using a byte from every row key to identify the location of the primary datacenter where data should be stored (i.e., where it is likely to be accessed). > 2) Using an additional CF for every row to specify the origin of the data. > 3) Replicating only to the 2 closest datacenters from the user (for reliability reasons) upon a write update. For reads, a user would try to fetch data from the 2 closest datacenters; if data is not available it would try the other remaining datacenters. If data fails to be retrieved too many times, it means that the client has moved to other part of the planet, and thus data should be migrated accordingly. We could have some problems here, like having the same rows, but with different CFs in different DCs (i.e., if users perform updates to the same rows from different remote places). > What would be the best way to do this? > Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)