cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-1684) Entity groups
Date Wed, 23 Nov 2011 17:01:40 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155965#comment-13155965
] 

Sylvain Lebresne commented on CASSANDRA-1684:
---------------------------------------------

It is a good question, and I suppose it depends on what was the motivation for row groups
in the first place (after all, we've always kind of be able to arbitrary nest, we just have
(slightly) simpler way now).

For instance, if the goal is to make sure rows are collocated, having to do it with composite
may not be very convenient, in particular if you wan to collocate rows across multiple CF.
Of course it is always possible to redesign the model so that you use the same row key and
use composite, but that could be really weird. To "solve" that last part, we could provide
the row group API but encode it server side with composites.

However, I think we should be aware that pushing such encoding has limitation today:
* there is the same problem that encoding super columns with composite, i.e. we'd need range
tombstones.
* rows have a number of subtle limitation that are fine, but may be a bit less fine if you
start to push for collocating lots and lots of data under one row:
** There is the 2B columns limit
** If a row is > 2GB, it won't be mmapped
** compaction is slower on big rows
** performance can globally be less good on huge rows
** leveled compaction has at least one row per sstable. Goes a bit against fixed size sstables.
Don't get me wrong, for most case, this is probably fine and we likely want to improve on
all of this, but those are still obstacle to co-locating large amount of data under the same
row

Now maybe pushing the co-location of data is not a good idea for a distributed store (it obviously
raise the question of load balancing in particular), but there is case where careful co-location
is paramount to the best performance so giving a good tool for that could have value.

Doing row groups 'natively' would avoids the gotcha above but note that it has at least one
drawback: if/once we do CASSANDRA-2893, isolation for row group encoded with  composite type
would be a given, with 'native' row group we would have to work a bit.

So overall, I think row group could have an interest API wise, making for a number of more
natural modeling. And if we think this is indeed useful, I kind of think doing it natively
could be less headache than an encoding with composites overall.
                
> Entity groups
> -------------
>
>                 Key: CASSANDRA-1684
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1684
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Sylvain Lebresne
>             Fix For: 1.1
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Supporting entity groups similar to App Engine's (that is, allow rows to be part of a
parent "entity group," whose key is used for routing instead of the row itself) allows several
improvements:
>  - batches within an EG can be atomic across multiple rows
>  - order-by-value queries within an EG only have to touch a single replica even with
RandomPartitioner

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message