Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6E4AFDC44 for ; Mon, 1 Oct 2012 08:46:58 +0000 (UTC) Received: (qmail 34314 invoked by uid 500); 1 Oct 2012 08:46:56 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 33864 invoked by uid 500); 1 Oct 2012 08:46:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 33820 invoked by uid 99); 1 Oct 2012 08:46:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Oct 2012 08:46:47 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of honore.c@gmail.com designates 209.85.216.172 as permitted sender) Received: from [209.85.216.172] (HELO mail-qc0-f172.google.com) (209.85.216.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Oct 2012 08:46:39 +0000 Received: by qcac10 with SMTP id c10so2650711qca.31 for ; Mon, 01 Oct 2012 01:46:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=JgSuOwyaTAsxg1oI/kwq9Kh5zwH4z/6of6bhP3NJEC8=; b=0IrOGb9kzJ4VJKROucFco8fUZ0FzRJlfrXr0783kDmKu1KKpsb5K0xOFmxw0Vq/sBw FkzVyzOQ9y8nYQerisn9vX8N8udC7fVm7HSz255/Qu8jC14E8PJY2m8awnOg485tEzVY rHB5dNPyeLBMcGvaALu9caLq7Kr/xYAy4SmbTWtv+hUhlnocWb8Ya7xraMI0Uq3bF/Pz yChkCIPhtLmEKipg6zwT5wgBVXisG/eb6/3929qIlYr1XWqCO7tH4+Ml+acNypj16KJK FuSPH2XrDfH+ce9apDqVUEHV5xEGzZO8xfC7DDoDQigaEwYHh9v5sJBERSxK19YR3jPg SRzw== Received: by 10.224.186.9 with SMTP id cq9mr35130106qab.33.1349081178771; Mon, 01 Oct 2012 01:46:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.49.58.179 with HTTP; Mon, 1 Oct 2012 01:45:58 -0700 (PDT) In-Reply-To: <1348849740.5202.4.camel@tim-desktop> References: <1348849740.5202.4.camel@tim-desktop> From: Clement Honore Date: Mon, 1 Oct 2012 10:45:58 +0200 Message-ID: Subject: Re: Help for creating a custom partitioner To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf30334b436f110e04cafb7091 --20cf30334b436f110e04cafb7091 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi, thanks for your answer. We plan to use manual indexing too (with native C* indexing for other cases). So, for one index, we will get plenty of FK and a MultiGet call to get all the associated entities, with RP, would then spread all the cluster. As we don't know the cluster size yet, and as it's expected to grow at an unknown rate, we are thinking about alternatives, now, for scalability. But, to tell the truth, so far, we have not done performance tests. But as the choice of a partitioner is the first C* cornerstone, we are already thinking about a new partitioner. We are planning tests "random vs custom partitioner" =3D> so, my questions for creating, first, another one. AFAIS, your partitioner (the higher bits of the hash from hashing the category, and the lower bits of the hash from hashing the document id) will put all the docs of a category in (in average) 1 node. Quite interesting, thanks! I could add such a partitioner to my test suite. But, why not just hashing the "category" part of the row key ? With such partitioner, as said before, many rows on *one* node are going to have the same hash value. - if it hurts Cassandra behavior/performance =3D> I am curious to know why. Anyway, in that case, I see your partitioner, so far, as the best answer to my wishes! - if it's NOT hurting Cassandra behavior/performance =3D> it sounds, then, = an optimal partitioner for our needs. Any idea about Cassandra behavior with such hash (category-only) partitioner ? Regards, Cl=E9ment 2012/9/28 Tim Wintle > On Fri, 2012-09-28 at 18:20 +0200, Clement Honore wrote: > > Hi,**** > > > > ** ** > > > > I have hierarchical data.**** > > > > I'm storing them in CF with rowkey somewhat like (category, doc id), an= d > > plenty of columns for a doc definition.**** > > > > ** ** > > > > I have hierarchical data traversal too.**** > > > > The user just chooses one category, and then, interact with docs > belonging > > only to this category.**** > > > > ** ** > > > > 1) If I use RandomPartitioner, all docs could be spread within all node= s > in > > the cluster =3D> bad performance.**** > > > > ** ** > > > > 2) Using RandomPartitioner, an alternative design could be > rowkey=3Dcategory > > and column name=3D(doc id, prop name)**** > > > > I don't want it because I need fixed column names for indexing purposes= , > > and the "category" is quite a lonnnng string.**** > > > > ** ** > > > > 3) Then, I want to define a new partitioner for my rowkey (category, do= c > > id), doing MD5 only for the "category" part.**** > > > > ** ** > > > > The question is : with such partitioner, many rows on *one* node are > going > > to have the same MD5 value, as a result of this new partitioner.**** > > If you do decide writing having rows on the same node is what you want, > then you could take the higher bits of the hash from hashing the > category, and the lower bits of the hash from hashing the document id. > > That would mean documents in a category would be close to each other in > the ring - while being unlikely to share the same hash. > > > However, If you're doing this then all reads/writes to the category are > going to be to a single machine. That's not going to spread the load > across the cluster very well as I assume a few categories are going to > be far more popular than others. > > Have you tested that you actually get bad performance from > RandomPartitioner? > > Tim > > --20cf30334b436f110e04cafb7091 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,

thanks for your an= swer.

We plan to use manu= al indexing too (with native C* indexing for other cases).

So, for one index, we will get plent= y of FK and a MultiGet call to get all the associated entities, with RP, wo= uld then spread all the cluster.
As we don't know the cluster siz= e yet, and as it's expected to grow at an unknown rate, we are thinking= about alternatives, now, for scalability.

But, to tell the truth, so far, we have not done performance tests.=
But as the choice of a partitioner i= s the first C* cornerstone, we are already thinking about a new partitioner= .
We are planning tests "random v= s custom partitioner" =3D> so, my questions for creating, first, an= other one.

AFAIS, your partitioner (the higher bits of the hash from hashing the cate= gory, and the lower bits of the hash from hashing the document id) will put= all the docs of a category in (in average) 1 node. Quite interesting, than= ks!
I could add such a partitioner to my= test suite.

But, why not just hashing the "category" part of the row key ?
With such partitioner, as said befor= e, many rows on *one* node are going to have the same hash value. - if it hurts Cassandra behavior/per= formance =3D> I am curious to know why. Anyway, in that case, I see your= partitioner, so far, as the best answer to my wishes!
- if it's NOT hurting Cassandra = behavior/performance =3D> it sounds, then, an optimal partitioner for ou= r needs.

Any idea about Cassandra behavior with such hash (category-only) partition= er ?

Regards,
Cl=E9ment

2012/9/28 Tim Wintle <timwintle@gmail.com>
On Fri, 2012-09-28 at 18:20 +0200, Clement H= onore wrote:
> Hi,****
>
> ** **
>
> I have hierarchical data.****
>
> I'm storing them in CF with rowkey somewhat like (category, doc id= ), and
> plenty of columns for a doc definition.****
>
> ** **
>
> I have hierarchical data traversal too.****
>
> The user just chooses one category, and then, interact with docs belon= ging
> only to this category.****
>
> ** **
>
> 1) If I use RandomPartitioner, all docs could be spread within all nod= es in
> the cluster =3D> bad performance.****
>
> ** **
>
> 2) Using RandomPartitioner, an alternative design could be rowkey=3Dca= tegory
> and column name=3D(doc id, prop name)****
>
> I don't want it because I need fixed column names for indexing pur= poses,
> and the "category" is quite a lonnnng string.****
>
> ** **
>
> 3) Then, I want to define a new partitioner for my rowkey (category, d= oc
> id), doing MD5 only for the "category" part.****
>
> ** **
>
> The question is : with such partitioner, many rows on *one* node are g= oing
> to have the same MD5 value, as a result of this new partitioner.= ****

If you do decide writing having rows on the same node is what you want,
then you could take the higher bits of the hash from hashing the
category, and the lower bits of the hash from hashing the document id.

That would mean documents in a category would be close to each other in
the ring - while being unlikely to share the same hash.


However, If you're doing this then all reads/writes to the category are=
going to be to a single machine. That's not going to spread the load across the cluster very well as I assume a few categories are going to
be far more popular than others.

Have you tested that you actually get bad performance from
RandomPartitioner?

Tim


--20cf30334b436f110e04cafb7091--