cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9425) Make node-local schema fully immutable
Date Tue, 10 Jan 2017 10:03:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814547#comment-15814547
] 

Sylvain Lebresne commented on CASSANDRA-9425:
---------------------------------------------

I very much like the changes of this patch in general. I also agree with [~snazy] small follow-up,
getting rid of {{keyspaceAndTablePair}} sounds good. The patch is big though so I have a bunch
of remarks/suggestions, some of which may or may not be very useful. Some of the comments
may also not be entirely new to this patch, I haven't double-checked everything, but as the
main goal of this patch is to clean stuff up, let say I don't feel too bad if there is a few
such remarks.

I have also a few things, mostly nits, that I just wrote while reviewing and pushed in branch
below. One noticeable commit that is not entirely nitty is the introduction of a {{TableId}}
class (that just "wrap" a UUID). It felts like this make the code read better with no real
downside. It's a trivial patch, just with a big-ish diff.
| [9425-review|https://github.com/pcmanus/cassandra/commits/9425-review] | [utests|http://cassci.datastax.com/job/pcmanus-9425-review-testall]
| [dtests|http://cassci.datastax.com/job/pcmanus-9425-review-dtest] |


Other points:
* {{TableMetadataRef.to()}} feels like it could be misused easily so I'd suggest making things
a bit more explicit. In practice, it has 3 main usages:
*# in {{Schema}}, which is kind of the one true use, where I'd directly use the ctor (which
would be package-protected and have the same comment than on {{set()}}).
*# for offline tools: I'd rename {{to()}} to something like {{forOfflineTool()}} that make
it clear it's ok only in that context. A comment would be nice too.
*# for indexes, where we kind of hack around them not having their own ID. I think it's worth
pointing out we're doing something specific by having a {{forIndex()}} method with proper
comments (and probably an assertion that validate it's indeed called on an index). Or maybe
go with my next point.
* Talking of index handling, {{TableMetadataRef.inPlaceUpdateIndexedTableMetadata}} feels
dangerous. The {{set()}} method is very explicit about not being for public consumption, but
that {{inPlaceUpdateIndexedTableMetadata}} method calls {{set()}} directly and is public.
It's also particularly hard to convince oneself that it's used in a safe maner (you check
first that {{SecondaryIndexManager.reload()}} is indeed only called protected by migration
synchronization, but the call to {{inPlaceUpdateIndexedTableMetadata}} is on a task executed
by an {{ExecutorService}} so you have to double-check it's not really asynchronous (despite
what the javadoc in {{Index.java}} claims!) and thus is indeed protected).
  Anyway, at a minimum some comments are imo missing, but I wonder if we could make this more
obviously correct by pushing it to {{Schema}}, having a map of 'index name' -> TableMetadataRef
specific to indexes. Updating that in {{Schema.load()}} shouldn't be terribly complex. Or
am I missing a reason why that wouldn't work?
* I'm unclear on the changes to {{AlterTypeStatement}}. The patch removes the code that propagate
type changes to other table/view/types, but I neither see where that has moved, nor why that
wouldn't be needed anymore (as an aside, if it does is not needed anymore, than we could remove
the {{updateTypes()}} and {{updateWith()}} methods that are unused in the patch).
* I'd rename {{addRegularColumn}} and {{alterColumnType}} in {{ViewMetadata}} as they are
not modifying the {{ViewMetadata}} but creating copies (say to {{withAddedRegularColumn()}}
and {{withAlteredColumnType}}).
* The patch changes some system table options (compared to current trunk), namely:
** {{AuthKeyspace}} tables don't get his {{memtableFlushPeriod}} to 1h.
** {{SystemDistributedKeyspace}} gets a default gcGrace, but it was of 10 days.
** None of the system table gets {{readRepairChance == 0}} (only {{dcLocaReadRepairChance}}
is set to 0).
* In {{Indexes.validate()}}, not sure it's worth doing the duplicate name check since we do
it at the keyspace level already (and only doing it withing a single table is a false sense
of safety).
* {{Keyspaces}} has both a {{get(String)}} that return {{Optional}} and a {{getNullable(String)}}.
We should probably remove one.
* There is {{TODO: FIXME}} in ColumnFamilyStore.setCompressionParameters.
* There is a {{TODO FIXME}} in SchemaKeyspace (and not 100% sure what it's about).
* Nit: In {{SSTableTxnWriter}}, it feels slightly inconsistent that {{createRangeAware}} takes
a {{TableMetadata}}, but {{create}} takes a {{TableMetadataRef}}.
* In {{Schema}}:
** Realized afterwards that this wasn't new to this patch, but mentioning it nonetheless:
loading the system keyspaces in Schema ctor feels dodgy, especially since it's conditional
on some initialization methods (and it's a singleton). What if Schema somehow happens to be
loaded too early for instance? Anyway, would be nice to move that code to some {{loadSystemTables()}}
function called in the proper place. Also, the {{Schema}} ctor should probably be private.
** slightly uncomfortable with {{load()}} as it is. It allow both new and updated keyspace,
but doesn't do everything it should on an update (it doesn't reload {{ColumnFamilyStore}},
...). And while those other parts are done properly by {{merge()}} and true schema migrations,
{{load()}} is public so it feels easy to misuse. Even for brand new keyspaces, it's a bit
confusing that {{load()}} is only a sub-part of what {{Schema.createKeyspace()}} (it works
due to 1) my next point and 2) the fact we happen to not care about notifications when load()
is called, but it's a bit messy).
** Some of {{Schema.createKeyspace()}} is actually redundant: {{Keyspace}} ctor already initialize
all tables and views, so it ends up being duplicated. It's largely harmless though we do end
up reloading each {{ColumnFamilyStore}} for no reason. Taken with my previous point, I wonder
if things could be refactored a bit so it's clearer when initialization happens.
** {{Schema.load}} and {{Schema.unload}} should probably be synchronized for their modification
of {{keyspaces}}. I acknowledge that those method are called by other methods that _are_ synchronized,
but 1) it feels adding synchronization directly on those method at least adds clarify and
prevent future misues and 2) not all path are actually synchronized since {{load()}} is public
(here again, the concrete usage are actually fine, so not claiming a bug, just suggesting
making things more future proof).
** {{validateTable}} seems to replace {{Validation.validateColumnFamily}} but the latter is
still there and used in a few instances. Would be nice to clean up. {{Validation.validateKeyspaceNotSystem}}
could probably be migrated too for consistency.
** I'd maybe suggest adding a private {{handleDiff(mapDiff, Function onDropped, Function onAdded,
Function onUpdated)}} method to simplify a bit.
** Is there a rational for notifying everything at the end, versus calling each notification
in the "appropriate" sub-method (notifying table alter in {{alterTable()}}, ....)? The later
would feels a tad more readable to me as it's slightly easier to check we haven't forgotten
something.
* In {{TableMetadata}}, validateCompatibility() should probably check more stuffs (we can't
alter non-PK column types as we want, nor change the partition key size for instance).

Also, as the patch does quite a bit of renaming, allow me a few additional suggestions. I
won't get mad if you'd rather not do those though:
* I'd rename {{TableMetadata.table}} to {{TableMetadata.name}} for consistency with other
classes ({{KeyspaceMetadata}} and {{IndexMetadata}} at least). While at it, I'd also rename
{{ksName}} to {{keyspace}} and {{viewName}} to {{name}} in {{ViewMetadata}}.
* I'd also probably rename {{PartitionColumns}} to {{RegularAndStaticColumns}} now.

> Make node-local schema fully immutable
> --------------------------------------
>
>                 Key: CASSANDRA-9425
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9425
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Aleksey Yeschenko
>            Assignee: Aleksey Yeschenko
>             Fix For: 4.0
>
>
> The way we handle schema changes currently is inherently racy.
> All of our {{SchemaAlteringStatement}} s perform validation on a schema state that's
won't necessarily be there when the statement gets executed and mutates schema.
> We should make all the *Metadata classes ({{KeyspaceMetadata, TableMetadata}}, {{ColumnMetadata}},
immutable, and local schema persistently snapshottable, with a single top-level {{AtomicReference}}
to the current snapshot. Have DDL statements perform validation and transformation on the
same state.
> In pseudo-code, think
> {code}
> public interface DDLStatement
> {
>     /**
>      * Validates that the DDL statement can be applied to the provided schema snapshot.
>      *
>      * @param schema snapshot of schema before executing CREATE KEYSPACE
>      */
>     void validate(SchemaSnapshot schema);
>  
>     /**
>      * Applies the DDL statement to the provided schema snapshot.
>      * Implies that validate() has already been called on the provided snapshot.
>      *
>      * @param schema snapshot of schema before executing the statement
>      * @return snapshot of schema as it would be after executing the statement
>      */
>     SchemaSnapshot transform(SchemaSnapshot schema);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message