Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 31480 invoked from network); 5 Apr 2011 22:05:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Apr 2011 22:05:57 -0000 Received: (qmail 4742 invoked by uid 500); 5 Apr 2011 22:05:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 4720 invoked by uid 500); 5 Apr 2011 22:05:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 4712 invoked by uid 99); 5 Apr 2011 22:05:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Apr 2011 22:05:55 +0000 X-ASF-Spam-Status: No, hits=3.6 required=5.0 tests=FS_REPLICA,SPF_PASS,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [141.211.12.74] (HELO webrelay-macc.mr.itd.umich.edu) (141.211.12.74) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Apr 2011 22:05:49 +0000 Received: FROM wagthedog-trpl.mail.umich.edu (wagthedog-trpl.mail.umich.edu [141.211.12.100]) By webrelay-macc.mr.itd.umich.edu ID 4D9B9228.4262A.17253 ; 5 Apr 2011 18:05:28 EDT Received: (from www@localhost) by wagthedog-trpl.mail.umich.edu () id p35M5SvN005659; Tue, 5 Apr 2011 18:05:28 -0400 To: user@cassandra.apache.org Subject: Location-aware replication based on objects' access pattern MIME-Version: 1.0 Date: Tue, 05 Apr 2011 18:05:28 -0400 From: Yudong Gao Message-ID: <4b72d650b0cd8fec918db3ddc4c49d08@umich.edu> X-Sender: stgyd@umich.edu User-Agent: RoundCube Webmail/svn X-Remote-Browser: Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16 X-RoundCube-Server: 192.168.221.100 (wagthedog-trpl) X-Originating-IP: 141.212.110.207 X-Originating-User: stgyd Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="UTF-8" Hi, I am thinking about using Cassandra for our research project, and we are thinking about one interesting feature. Our setup has multiple datacenters located in different geography locations. Data is accessed with predictable patterns. Think of something like Craigslist, data objects corresponding to CA will mostly accessed by users from the west cost. If this case, if all the replicas are stored in the east coast, the access would not be efficient. Other applications such as Facebook, should also have similar concern. I am aware of the placement strategies such as RackAwareStrategy/NetworkTopologyStrategy. But they place objects based on their hashed token, but not they access pattern. I am thinking about one possible trick, which is to manipulate the key of the object based on its access pattern, so that the key can be mapped to a token that will have at least one replica (ideally the primary replica) stored in the desired data center, and the other replicas stored in other data centers for reliability concern. I found this post discussing a similar problem, http://www.mail-archive.com/user@cassandra.apache.org/msg00695.html but Ben suggested just writing one new replication strategy. IMO, this location-aware replication should be one common problem for Cassandra, especially since it has been widely used in many large-scale commercial applications such as Facebook and Twitter. I am interested in how they handle this problem. Is there any existing solution that I refer to and get start with? Thanks! Yudong