Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 15BF06456 for ; Sun, 5 Jun 2011 16:18:41 +0000 (UTC) Received: (qmail 77782 invoked by uid 500); 5 Jun 2011 16:18:39 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 77761 invoked by uid 500); 5 Jun 2011 16:18:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 77753 invoked by uid 99); 5 Jun 2011 16:18:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Jun 2011 16:18:38 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nguyen.h.khanh@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qw0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Jun 2011 16:18:31 +0000 Received: by qwc23 with SMTP id 23so2339184qwc.31 for ; Sun, 05 Jun 2011 09:18:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=4ofZ9Y8XGoY2+bDVnt2QkXHD/bRd5fG5WdF1SZk0KOk=; b=lbraouKVFBmfwlTSoygEnk04fakAtng48clPvt1W5AU/dsEmwNLF7cdPP1RiCAFTri GM2Y1kU53t+8PpVdDb53xJQGv7fPgTYdfWlwqP0+X7q+SUKq0L1cyPo5PDVaGrHodgQW bAjkCe25ptSP0JrdXERTd9g4xgjoloenvERgg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=UtGEhSzpSnDwuQrjtORcFZHY0Jrq45UiPvUPW3eHDaXDGhDHnK27hZesQdZAocUw1P wSz3KEzU1LUm92mGbkohfVPaWNKtL5+A4cNvzLWOzVU5ruOHbOK4ibcYJCIBMDLGQdeD utQRbS0OrvE5SfphtQ76ItCZ5oUAKw6yj41xE= MIME-Version: 1.0 Received: by 10.224.33.70 with SMTP id g6mr1382984qad.140.1307290689956; Sun, 05 Jun 2011 09:18:09 -0700 (PDT) Received: by 10.224.80.147 with HTTP; Sun, 5 Jun 2011 09:18:09 -0700 (PDT) In-Reply-To: References: Date: Sun, 5 Jun 2011 12:18:09 -0400 Message-ID: Subject: Re: Direct control over where data is stored? From: Khanh Nguyen To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi Maki and Adrian, Thank you very much for the promptness. It's weekend after all :). I realized I forgot a part of my question until Adrian mentioned the replication factor. Is it also possible to set where the replicas are stored as well? Thanks. This is a research experiment we're exploring with socially-related data. If we want to pull data of A and B out of Cassandra, (i.e LastNameColumn['A'], and LastNameColumn['B'), it should be faster if these values are stored in the same box than if one is stored at a box in NY and another, Tokyo, no? Regards, -k On Sun, Jun 5, 2011 at 2:07 AM, Adrian Cockcroft wrote: > Sounds like Khanh thinks he can do joins... :-) > > User oriented data is easy, key by facebook id, let cassandra handle > location. Set replication factor=3 so you don't lose data and can do > consistent but slower read after write when you need to using quorum. > If you are running on AWS you should distribute your replicas over > availability zones. > > Then you can do read A, read B join them in your app code. Single > digit milliseconds for each read or write. > > If you want to do bulk operations over many users, use Brisk with a Hadoop job. > > HTH > Adrian > > On Sat, Jun 4, 2011 at 9:32 PM, Maki Watanabe wrote: >> You may be able to do it with the Order Preserving Partitioner with >> making key to node mapping before storing data, or you may need your >> custom Partitioner. Please note that you are responsible to distribute >> load between nodes in this case. >> From application design perspective, it is not clear for me why you >> need to store user A and his friends into same box.... >> >> maki >> >> >> 2011/6/5 Khanh Nguyen : >>> Hi everyone, >>> >>> Is it possible to have direct control over where objects are stored in >>> Cassandra? For example, I have a Cassandra cluster of 4 machines and 4 >>> objects A, B, C, D; I want to store A at machine 1, B at machine 2, C >>> at machine 3 and D at machine 4. My guess is that I need to intervene >>> they way Cassandra hashes an object into the keyspace? If so, how >>> complicated the task will be? >>> >>> I'm new to the list and Cassandra. The reason I am asking is that my >>> current project is related to social locality of data: if A and B are >>> Facebook friends, I want to store their data as close as possible, >>> preferably in the same machine in a cluster. >>> >>> Thank you. >>> >>> Regards, >>> >>> -k >>> >> >> >> >> -- >> w3m >> >