Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of timwintle@gmail.com
 designates 74.125.83.44 as permitted sender)
Message-ID: <1349084687.19662.9.camel@tim-desktop>
Subject: Re: Help for creating a custom partitioner
From: Tim Wintle <timwintle@gmail.com>
To: user@cassandra.apache.org
Date: Mon, 01 Oct 2012 10:44:47 +0100
In-Reply-To: 
 <CAMyZJ7YnFKM_1QJF4QCUfiWsH4DKrfBDQ5ohcE9Q1KC41yWfVw@mail.gmail.com>
References: 
	<CAMyZJ7YtxMwyjba6d78hEDHMCw4a4Hkw+aby4w3u_2MRtjq0BQ@mail.gmail.com>
	 <1348849740.5202.4.camel@tim-desktop>
	 <CAMyZJ7YnFKM_1QJF4QCUfiWsH4DKrfBDQ5ohcE9Q1KC41yWfVw@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Mime-Version: 1.0

On Mon, 2012-10-01 at 10:45 +0200, Clement Honore wrote:
> We plan to use manual indexing too (with native C* indexing for other
> cases).
> So, for one index, we will get plenty of FK and a MultiGet call to get all
> the associated entities, with RP, would then spread all the cluster.
> As we don't know the cluster size yet, and as it's expected to grow at an
> unknown rate, we are thinking about alternatives, now, for scalability.
> 
> But, to tell the truth, so far, we have not done performance tests.
> But as the choice of a partitioner is the first C* cornerstone, we are
> already thinking about a new partitioner.
> We are planning tests "random vs custom partitioner" => so, my questions
> for creating, first, another one.
> 
> AFAIS, your partitioner (the higher bits of the hash from hashing the
> category, and the lower bits of the hash from hashing the document id) will
> put all the docs of a category in (in average) 1 node. Quite interesting,
> thanks!
> I could add such a partitioner to my test suite.
> 
> But, why not just hashing the "category" part of the row key ?
> With such partitioner, as said before, many rows on *one* node are going to
> have the same hash value.
> - if it hurts Cassandra behavior/performance => I am curious to know why.
> Anyway, in that case, I see your partitioner, so far, as the best answer to
> my wishes!
> - if it's NOT hurting Cassandra behavior/performance => it sounds, then, an
> optimal partitioner for our needs.
> 
> Any idea about Cassandra behavior with such hash (category-only)
> partitioner ?

I honestly don't know the code well enough - but I have always assumed
(perhaps incorrectly) that the whole SSTable / Memtable system was
sorted on the hash value rather than the key, so that range queries are
efficient - so if all items on a node have the same hash you would get
awful performance for (at least) reading specific rows from disk. I
could be wrong in my assumptions.

Certainly having lots of hash collisions is unusual behaviour - I don't
imagine the time behaviour has been tested against that situation
closely.

If you haven't yet tested it, then I'm not sure why you assume that
accesses from a single machine would be faster than from documents
spread around the ring - ethernet is fast, and if you're going to have
to do disk seeks to get any of this data then you can run the seeks in
parallel across a large number of spindles by spreading the load around
the cluster.

It also adds extra load onto machines handling popular categories -
assuming the number of categories is significantly smaller than the
number of documents that could make a major difference to latency.

Tim


> 
> Regards,
> Clément
> 
> 2012/9/28 Tim Wintle <timwintle@gmail.com>
> 
> > On Fri, 2012-09-28 at 18:20 +0200, Clement Honore wrote:
> > > Hi,****
> > >
> > > ** **
> > >
> > > I have hierarchical data.****
> > >
> > > I'm storing them in CF with rowkey somewhat like (category, doc id), and
> > > plenty of columns for a doc definition.****
> > >
> > > ** **
> > >
> > > I have hierarchical data traversal too.****
> > >
> > > The user just chooses one category, and then, interact with docs
> > belonging
> > > only to this category.****
> > >
> > > ** **
> > >
> > > 1) If I use RandomPartitioner, all docs could be spread within all nodes
> > in
> > > the cluster => bad performance.****
> > >
> > > ** **
> > >
> > > 2) Using RandomPartitioner, an alternative design could be
> > rowkey=category
> > > and column name=(doc id, prop name)****
> > >
> > > I don't want it because I need fixed column names for indexing purposes,
> > > and the "category" is quite a lonnnng string.****
> > >
> > > ** **
> > >
> > > 3) Then, I want to define a new partitioner for my rowkey (category, doc
> > > id), doing MD5 only for the "category" part.****
> > >
> > > ** **
> > >
> > > The question is : with such partitioner, many rows on *one* node are
> > going
> > > to have the same MD5 value, as a result of this new partitioner.****
> >
> > If you do decide writing having rows on the same node is what you want,
> > then you could take the higher bits of the hash from hashing the
> > category, and the lower bits of the hash from hashing the document id.
> >
> > That would mean documents in a category would be close to each other in
> > the ring - while being unlikely to share the same hash.
> >
> >
> > However, If you're doing this then all reads/writes to the category are
> > going to be to a single machine. That's not going to spread the load
> > across the cluster very well as I assume a few categories are going to
> > be far more popular than others.
> >
> > Have you tested that you actually get bad performance from
> > RandomPartitioner?
> >
> > Tim
> >
> >