cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: N to N relationships
Date Sun, 12 Dec 2010 18:25:58 GMT
On Sun, Dec 12, 2010 at 3:20 AM, David Boxenhorn <> wrote:
> You want to store every value twice? That would be a pain to maintain, and
> possibly lead to inconsistent data.
> On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey <> wrote:
>> I would also recommend two column families. Storing the key as NxN would
>> require you to hit multiple machines to query for an entire row or column
>> with RandomPartitioner. Even with OPP you would need to pick row or columns
>> to order by and the other would require hitting multiple machines.  Two
>> column families avoids this and avoids any problems with choosing OPP.
>> On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton <>
>> wrote:
>>> Am assuming you have one matrix and you know the dimensions. Also as you
>>> say the most important queries are to get an entire column or an entire row.
>>> I would consider using a standard CF for the Columns and one for the
>>> Rows.  The key for each would be the col / row number, each cassandra column
>>> name would be the id of the other dimension and the value whatever you want.
>>> - when storing the data update both the Column and Row CF
>>> - reading a whole row/col would be simply reading from the appropriate
>>> CF.
>>> - reading an intersection is a get_slice to either col or row CF using
>>> the column_names field to identify the other dimension.
>>> You would not need secondary indexes to serve these queries.
>>> Hope that helps.
>>> Aaron
>>> On 10 Dec, 2010,at 07:02 AM, Sébastien Druon <> wrote:
>>> I mean if I have secondary indexes. Apparently they are calculated in the
>>> background...
>>> On 9 December 2010 18:33, David Boxenhorn <> wrote:
>>>> What do you mean by indexing?
>>>> On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon <>
>>>> wrote:
>>>>> Thanks a lot for the answer
>>>>> What about the indexing when adding a new element? Is it incremental?
>>>>> Thanks again
>>>>> On 9 December 2010 14:38, David Boxenhorn <> wrote:
>>>>>> How about a regular CF where keys are N@N ?
>>>>>> Then, getting a matrix row would be the same cost as getting a matrix
>>>>>> column (N gets), and it would be very easy to add element N+1.
>>>>>> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <>
>>>>>> wrote:
>>>>>>> Hello,
>>>>>>> For a specific case, we are thinking about representing a N to
>>>>>>> relationship with a NxN Matrix in Cassandra.
>>>>>>> The relations will be only between a subset of elements, so the
>>>>>>> Matrix will mostly contain empty elements.
>>>>>>> We have a set of questions concerning this:
>>>>>>> - what is the best way to represent this matrix? what would have
>>>>>>> best performance in reading? in writing?
>>>>>>>   . a super column family with n column families, with n columns
>>>>>>>   . a column family with n columns and n lines
>>>>>>> In the second case, we would need to extract 2 kinds of information:
>>>>>>> - all the relations for a line: this should be no specific problem;
>>>>>>> - all the relations for a column: in that case we would need
an index
>>>>>>> for the columns, right? and then get all the lines where the
value of the
>>>>>>> column in question is not null... is it the correct way to do?
>>>>>>> When using indexes, say we want to add another element N+1. What
>>>>>>> impact in terms of time would it have on the indexation job?
>>>>>>> Thanks a lot for the answers,
>>>>>>> Best regards,
>>>>>>> Sébastien Druon
Before secondary indexes the only option was to store the data twice.
Yes you have to maintain this yourself. The data model only provides
fast searches on the key. An index normally a separate entity with
different ordering, almost the same here.

View raw message