cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Shook <>
Subject Re: Doing joins between column familes
Date Wed, 26 May 2010 21:00:05 GMT
I wrote some Iterable<*> methods to do this for column families that
share key structure with OPP.
It is on the hector examples page. Caveat emptor.

It does iterative chunking of the working set for each column family,
so that you can set the nominal transfer size when you construct the
Iterator/Iterable. I've been very happy with the performance of it,
even over large ranges of keys. This is with
OrderPreservingPartitioner because of other requirements, so it may
not be a good example for comparison with a random partitioner, which
is preferred.

Doing joins as such on the server works against the basic design of
Cassandra. The server does a few things very well only because it
isn't overloaded with extra faucets and kitchen sinks. However, I'd
like to be able to load auxiliary classes into the server runtime in a
modular way, just for things like this. Maybe we'll get that someday.

My impression is that there is much more common key structure in a
workable Cassandra storage layout than in a conventional ER model.
This is the nature of the beast when you are organizing your
information more according to access patterns than fully normal
relationships. That is one of the fundamental design trade-offs of
using a hash structure over a schema.

Having something that lets you deploy a fully normal schema on a hash
store can be handy, but it can also obscure the way that your
application indirectly exercises the storage layer. The end-result may
be that the layout is less friendly to the underlying mechanisms of
Cassandra. I'm not saying that it is bad to have a tool to do this,
only that it can make it easy to avoid thinking about Cassandra
storage in terms of what it really is.

There may be ways to optimize the OCM queries, but that takes you down
the road of query optimization, which can be quite nebulous. My gut
instinct is to focus more on the layout, using aggregate keys and
common key structure where you can, so that you can take advantage of
the parallel queries more of the time.

On Wed, May 26, 2010 at 3:13 PM, Charlie Mason <> wrote:
> On Wed, May 26, 2010 at 7:45 PM, Dodong Juan <> wrote:
>> So I am not sure if you guys are familiar with OCM . Basically it is an ORM
>> for Cassandra. Been testing it
> In case anyone is interested I have posted a reply on the OCM issue
> tracker where this was also raised.
> Charlie M

View raw message