Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C03ABD0D1 for ; Fri, 28 Sep 2012 16:21:07 +0000 (UTC) Received: (qmail 68835 invoked by uid 500); 28 Sep 2012 16:21:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 68812 invoked by uid 500); 28 Sep 2012 16:21:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 68804 invoked by uid 99); 28 Sep 2012 16:21:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Sep 2012 16:21:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of honore.c@gmail.com designates 209.85.216.172 as permitted sender) Received: from [209.85.216.172] (HELO mail-qc0-f172.google.com) (209.85.216.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Sep 2012 16:20:58 +0000 Received: by qcac10 with SMTP id c10so977958qca.31 for ; Fri, 28 Sep 2012 09:20:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=CAp2JmbX+7CfH7RH7DMbITmVrGCRoV/Ymw2rJcZZOkQ=; b=i0LmWEIT0iIYxEloiOsCO6s6JmVTUA7r3KLUoXSRdtpI8aTKYMSzApvnNYjhRzNq9H 7tFuepzgyYSD7WhyGQ77PG8FgDfNw3ftij+jha5gnkl8RkmA5XmeIzEZy3ZQN9erlSEh 7ZMEik2d5WlkSONqbfgewTv3HOErFEnd8M9dEKxD8/LwrIUE6ojj5j684+w0riOqH6Sk l+UJGbOJHXZFCoJzwIe8qvueZv6nHGHbPmQxdeRNlJVU5S/RBnZNQ3eUVTGgk5yibcbk 1ArmkOcJwl3Ocx5ivcQXIpd/dc2fZKyNM73LFVSTH/mfSbuAVLMiS+KkkIlSy50bJ/JQ lbWg== Received: by 10.224.185.148 with SMTP id co20mr18266877qab.4.1348849237508; Fri, 28 Sep 2012 09:20:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.49.58.179 with HTTP; Fri, 28 Sep 2012 09:20:17 -0700 (PDT) From: Clement Honore Date: Fri, 28 Sep 2012 18:20:17 +0200 Message-ID: Subject: Help for creating a custom partitioner To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=485b397dd697a8411f04cac56fee X-Virus-Checked: Checked by ClamAV on apache.org --485b397dd697a8411f04cac56fee Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,**** ** ** I have hierarchical data.**** I'm storing them in CF with rowkey somewhat like (category, doc id), and plenty of columns for a doc definition.**** ** ** I have hierarchical data traversal too.**** The user just chooses one category, and then, interact with docs belonging only to this category.**** ** ** 1) If I use RandomPartitioner, all docs could be spread within all nodes in the cluster =3D> bad performance.**** ** ** 2) Using RandomPartitioner, an alternative design could be rowkey=3Dcategor= y and column name=3D(doc id, prop name)**** I don't want it because I need fixed column names for indexing purposes, and the "category" is quite a lonnnng string.**** ** ** 3) Then, I want to define a new partitioner for my rowkey (category, doc id), doing MD5 only for the "category" part.**** ** ** The question is : with such partitioner, many rows on *one* node are going to have the same MD5 value, as a result of this new partitioner.**** Is it going to hurt Cassandra behavior ?**** or its performance ?**** ** ** Thanks.**** ** ** Regards,**** Cl=E9ment. --485b397dd697a8411f04cac56fee Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Hi,

=A0

I'm storing them in CF with rowkey somewhat like (category, doc id), an= d plenty of columns for a doc definition.

=A0

The user just chooses one category, and then, interact with docs belonging = only to this category.

=A0

=A0

2) Using RandomPartitioner, an alternative design could be rowkey=3Dcategor= y and column name=3D(doc id, prop name)

I don't want it because I need fixed column names for indexing purposes= , and the "category" is quite a lonnnng string.

=

=A0

=A0

The question is : with such partitioner, many rows on *one* node are going = to have the same MD5 value, as a result of this new partitioner.<= /u>

Is it going to hurt Cassandra behavior ?

or its performance ?

=A0

Thanks.

=A0

Regards,

Cl=E9ment.

--485b397dd697a8411f04cac56fee--