incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiaan Zeng <ji...@bloomreach.com>
Subject Re: how to handle join properly in this case
Date Wed, 29 May 2013 20:53:45 GMT
Thanks for all the comments and thoughts! I think Hiller points out a
promising direction. I wonder if the partition and filter are features
shipped with Cassandra or features came from PlayOrm. Any resources
about that would be appreciated. Thanks!

On Tue, May 28, 2013 at 11:39 AM, Hiller, Dean <Dean.Hiller@nrel.gov> wrote:
> Another option is joins on partitions to keep the number of stuff needing
> to join relatively small.  PlayOrm actually supports joins of partition 1
> of table A with partition X of table B.  You then just keep the number of
> rows in each partition at less than millions and you can filter with the
> where clause.  It is yet another option out there.  By doing so, you still
> remain scalable as long as you design such that partitions don't grow to
> large(you can have as many partitions as you want though).
>
> Later,
> Dean
>
> On 5/28/13 12:33 PM, "aaron morton" <aaron@thelastpickle.com> wrote:
>
>>A common pattern is to materialise views, that is store the join at the
>>same time you are writing to CF's A and B.
>>
>>In this case it sounds like the two CF's are written to at different
>>times. If that is the case you may need to do the join client side (do
>>two reads).
>>
>>Hope that helps.
>>
>>-----------------
>>Aaron Morton
>>Freelance Cassandra Consultant
>>New Zealand
>>
>>@aaronmorton
>>http://www.thelastpickle.com
>>
>>On 27/05/2013, at 6:56 PM, Vegard Berget <post@fantasista.no> wrote:
>>
>>> Hi,
>>>
>>> I am no expert, but a couple of suggestions:
>>> 1)  Remember that writes are very fast i Cassandra, so don't be afraid
>>>to store more information than you would in an Sql-ish server.
>>> 2)  It would be better with an example, but again - by storing more
>>>than you would in an sql-schema, would you still need to compute a table
>>>C?  Is it possible to have just the CF A, and have all data in that CF?
>>>Would it be possible/easier to have the rules applied on the client, so
>>>that you don't have to change the schema/recalculate CF C?
>>>
>>> .vegard,
>>>
>>> ----- Original Message -----
>>> From:
>>> user@cassandra.apache.org
>>>
>>> To:
>>> <user@cassandra.apache.org>
>>> Cc:
>>>
>>> Sent:
>>> Sat, 25 May 2013 15:22:00 -0700
>>> Subject:
>>> how to handle join properly in this case
>>>
>>>
>>> Hi Experts,
>>>
>>> We have tables (a.k.a. column family) A and B. The row of the table is
>>> simply a key value pair. Table A and B are written by clients all the
>>> time. We need to transform the row key of table A and B according to
>>> a set of rules, join these two tables and save the results to table C
>>> for read.
>>>
>>> Questions
>>> 1) Schema like this is very close to a schema in SQL. We mainly want
>>> to separate read and write operations by different column families. Is
>>> this a correct way to model the join in Cassandra?
>>>
>>> 2) If the rules change a little bit, which is quite possible, we will
>>> have to drop table C, and recompute the join result, which seems to be
>>> cumbersome because not all the join results need to be changed, just
>>> some of them. Are there any better ways to handle such change?
>>>
>>> Thanks a lot.
>>>
>>>
>>> --
>>> Regards,
>>> Jiaan
>>
>



--
Regards,
Jiaan

Mime
View raw message