incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Boxenhorn <da...@lookin2.com>
Subject Re: N to N relationships
Date Sun, 12 Dec 2010 08:20:38 GMT
You want to store every value twice? That would be a pain to maintain, and
possibly lead to inconsistent data.

On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey <nick@riptano.com> wrote:

> I would also recommend two column families. Storing the key as NxN would
> require you to hit multiple machines to query for an entire row or column
> with RandomPartitioner. Even with OPP you would need to pick row or columns
> to order by and the other would require hitting multiple machines.  Two
> column families avoids this and avoids any problems with choosing OPP.
>
>
> On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton <aaron@thelastpickle.com>wrote:
>
>> Am assuming you have one matrix and you know the dimensions. Also as you
>> say the most important queries are to get an entire column or an entire row.
>>
>> I would consider using a standard CF for the Columns and one for the Rows.
>>  The key for each would be the col / row number, each cassandra column name
>> would be the id of the other dimension and the value whatever you want.
>>
>> - when storing the data update both the Column and Row CF
>> - reading a whole row/col would be simply reading from the appropriate CF.
>> - reading an intersection is a get_slice to either col or row CF using the
>> column_names field to identify the other dimension.
>>
>> You would not need secondary indexes to serve these queries.
>>
>> Hope that helps.
>> Aaron
>>
>> On 10 Dec, 2010,at 07:02 AM, Sébastien Druon <sdruon@spotuse.com> wrote:
>>
>> I mean if I have secondary indexes. Apparently they are calculated in the
>> background...
>>
>> On 9 December 2010 18:33, David Boxenhorn <david@lookin2.com> wrote:
>>
>>> What do you mean by indexing?
>>>
>>>
>>> On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon <sdruon@spotuse.com>wrote:
>>>
>>>> Thanks a lot for the answer
>>>>
>>>> What about the indexing when adding a new element? Is it incremental?
>>>>
>>>> Thanks again
>>>>
>>>>
>>>>
>>>> On 9 December 2010 14:38, David Boxenhorn <david@lookin2.com> wrote:
>>>>
>>>>> How about a regular CF where keys are N@N ?
>>>>>
>>>>> Then, getting a matrix row would be the same cost as getting a matrix
>>>>> column (N gets), and it would be very easy to add element N+1.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <sdruon@spotuse.com>wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> For a specific case, we are thinking about representing a N to N
>>>>>> relationship with a NxN Matrix in Cassandra.
>>>>>> The relations will be only between a subset of elements, so the Matrix
>>>>>> will mostly contain empty elements.
>>>>>>
>>>>>> We have a set of questions concerning this:
>>>>>> - what is the best way to represent this matrix? what would have
the
>>>>>> best performance in reading? in writing?
>>>>>>   . a super column family with n column families, with n columns
each
>>>>>>   . a column family with n columns and n lines
>>>>>>
>>>>>> In the second case, we would need to extract 2 kinds of information:
>>>>>> - all the relations for a line: this should be no specific problem;
>>>>>> - all the relations for a column: in that case we would need an index
>>>>>> for the columns, right? and then get all the lines where the value
of the
>>>>>> column in question is not null... is it the correct way to do?
>>>>>> When using indexes, say we want to add another element N+1. What
>>>>>> impact in terms of time would it have on the indexation job?
>>>>>>
>>>>>> Thanks a lot for the answers,
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Sébastien Druon
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message