Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1E7CBDCB9 for ; Mon, 1 Oct 2012 09:45:27 +0000 (UTC) Received: (qmail 91071 invoked by uid 500); 1 Oct 2012 09:45:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 90632 invoked by uid 500); 1 Oct 2012 09:45:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 90579 invoked by uid 99); 1 Oct 2012 09:45:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Oct 2012 09:45:18 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of timwintle@gmail.com designates 74.125.83.44 as permitted sender) Received: from [74.125.83.44] (HELO mail-ee0-f44.google.com) (74.125.83.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Oct 2012 09:45:10 +0000 Received: by eekd4 with SMTP id d4so2412189eek.31 for ; Mon, 01 Oct 2012 02:44:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:date:in-reply-to:references:content-type :x-mailer:content-transfer-encoding:mime-version; bh=4IvjSOCYPJ7V8H4lKDxmIF1WkQo3kXhMHk6BZZuWhBY=; b=Qf1i55HEB9vsTABdN5Ieu9q7VYWtf2Vx4BmX6bt6WH46XvQytbj1OQxIM8dvv5+EBw lM7Zs1dOVTtApUyGVhhYtoJ9pkEM8Jy6KcBXJuRSYNwAocAOT2BDEHSDzyHc1ROg2lnY QDI7q1J2H6yESLUXOwwosBXPG5zE+0jJNZJNSn4nD5dhs/htlqT0LnvTIT0XK9WA7R6g xnwXmpPBfeaSBVGWacmjGQkl0IDYRVxFNGvilsoFb/QnlkIgtWy4Hnjf8ELUW/sZql9y kq861LiTi0r71K4P3HKi41YIrnrq+IE7J8AKy8q7XAuzjaIZ4tLpGrtBm/2FafqsPYcQ Fh1g== Received: by 10.14.184.134 with SMTP id s6mr17093201eem.46.1349084689581; Mon, 01 Oct 2012 02:44:49 -0700 (PDT) Received: from [172.16.1.85] (87-194-110-230.bethere.co.uk. [87.194.110.230]) by mx.google.com with ESMTPS id m42sm46465146eep.16.2012.10.01.02.44.48 (version=SSLv3 cipher=OTHER); Mon, 01 Oct 2012 02:44:48 -0700 (PDT) Message-ID: <1349084687.19662.9.camel@tim-desktop> Subject: Re: Help for creating a custom partitioner From: Tim Wintle To: user@cassandra.apache.org Date: Mon, 01 Oct 2012 10:44:47 +0100 In-Reply-To: References: <1348849740.5202.4.camel@tim-desktop> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 8bit Mime-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org On Mon, 2012-10-01 at 10:45 +0200, Clement Honore wrote: > We plan to use manual indexing too (with native C* indexing for other > cases). > So, for one index, we will get plenty of FK and a MultiGet call to get all > the associated entities, with RP, would then spread all the cluster. > As we don't know the cluster size yet, and as it's expected to grow at an > unknown rate, we are thinking about alternatives, now, for scalability. > > But, to tell the truth, so far, we have not done performance tests. > But as the choice of a partitioner is the first C* cornerstone, we are > already thinking about a new partitioner. > We are planning tests "random vs custom partitioner" => so, my questions > for creating, first, another one. > > AFAIS, your partitioner (the higher bits of the hash from hashing the > category, and the lower bits of the hash from hashing the document id) will > put all the docs of a category in (in average) 1 node. Quite interesting, > thanks! > I could add such a partitioner to my test suite. > > But, why not just hashing the "category" part of the row key ? > With such partitioner, as said before, many rows on *one* node are going to > have the same hash value. > - if it hurts Cassandra behavior/performance => I am curious to know why. > Anyway, in that case, I see your partitioner, so far, as the best answer to > my wishes! > - if it's NOT hurting Cassandra behavior/performance => it sounds, then, an > optimal partitioner for our needs. > > Any idea about Cassandra behavior with such hash (category-only) > partitioner ? I honestly don't know the code well enough - but I have always assumed (perhaps incorrectly) that the whole SSTable / Memtable system was sorted on the hash value rather than the key, so that range queries are efficient - so if all items on a node have the same hash you would get awful performance for (at least) reading specific rows from disk. I could be wrong in my assumptions. Certainly having lots of hash collisions is unusual behaviour - I don't imagine the time behaviour has been tested against that situation closely. If you haven't yet tested it, then I'm not sure why you assume that accesses from a single machine would be faster than from documents spread around the ring - ethernet is fast, and if you're going to have to do disk seeks to get any of this data then you can run the seeks in parallel across a large number of spindles by spreading the load around the cluster. It also adds extra load onto machines handling popular categories - assuming the number of categories is significantly smaller than the number of documents that could make a major difference to latency. Tim > > Regards, > Clément > > 2012/9/28 Tim Wintle > > > On Fri, 2012-09-28 at 18:20 +0200, Clement Honore wrote: > > > Hi,**** > > > > > > ** ** > > > > > > I have hierarchical data.**** > > > > > > I'm storing them in CF with rowkey somewhat like (category, doc id), and > > > plenty of columns for a doc definition.**** > > > > > > ** ** > > > > > > I have hierarchical data traversal too.**** > > > > > > The user just chooses one category, and then, interact with docs > > belonging > > > only to this category.**** > > > > > > ** ** > > > > > > 1) If I use RandomPartitioner, all docs could be spread within all nodes > > in > > > the cluster => bad performance.**** > > > > > > ** ** > > > > > > 2) Using RandomPartitioner, an alternative design could be > > rowkey=category > > > and column name=(doc id, prop name)**** > > > > > > I don't want it because I need fixed column names for indexing purposes, > > > and the "category" is quite a lonnnng string.**** > > > > > > ** ** > > > > > > 3) Then, I want to define a new partitioner for my rowkey (category, doc > > > id), doing MD5 only for the "category" part.**** > > > > > > ** ** > > > > > > The question is : with such partitioner, many rows on *one* node are > > going > > > to have the same MD5 value, as a result of this new partitioner.**** > > > > If you do decide writing having rows on the same node is what you want, > > then you could take the higher bits of the hash from hashing the > > category, and the lower bits of the hash from hashing the document id. > > > > That would mean documents in a category would be close to each other in > > the ring - while being unlikely to share the same hash. > > > > > > However, If you're doing this then all reads/writes to the category are > > going to be to a single machine. That's not going to spread the load > > across the cluster very well as I assume a few categories are going to > > be far more popular than others. > > > > Have you tested that you actually get bad performance from > > RandomPartitioner? > > > > Tim > > > >