Am assuming you have one matrix and you know the dimensions. Also as you say the most important queries are to get an entire column or an entire row.
I would consider using a standard CF for the Columns and one for the Rows. The key for each would be the col / row number, each cassandra column name would be the id of the other dimension and the value whatever you want.
- when storing the data update both the Column and Row CF
- reading a whole row/col would be simply reading from the appropriate CF.
- reading an intersection is a get_slice to either col or row CF using the column_names field to identify the other dimension.
You would not need secondary indexes to serve these queries.
Hope that helps.
On 10 Dec, 2010,at 07:02 AM, Sébastien Druon <email@example.com> wrote:
I mean if I have secondary indexes. Apparently they are calculated in the background...
On 9 December 2010 18:33, David Boxenhorn <firstname.lastname@example.org>
What do you mean by indexing?
On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon <email@example.com>
Thanks a lot for the answer
What about the indexing when adding a new element? Is it incremental?
On 9 December 2010 14:38, David Boxenhorn <firstname.lastname@example.org>
How about a regular CF where keys are N@N ?
Then, getting a matrix row would be the same cost as getting a matrix column (N gets), and it would be very easy to add element N+1.
On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <email@example.com>
For a specific case, we are thinking about representing a N to N relationship with a NxN Matrix in Cassandra.
The relations will be only between a subset of elements, so the Matrix will mostly contain empty elements.
We have a set of questions concerning this:
- what is the best way to represent this matrix? what would have the best performance in reading? in writing?
. a super column family with n column families, with n columns each
. a column family with n columns and n lines
In the second case, we would need to extract 2 kinds of information:
- all the relations for a line: this should be no specific problem;
- all the relations for a column: in that case we would need an index for the columns, right? and then get all the lines where the value of the column in question is not null... is it the correct way to do?
When using indexes, say we want to add another element N+1. What impact in terms of time would it have on the indexation job?
Thanks a lot for the answers,