cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6717) Modernize schema tables
Date Thu, 09 Jul 2015 12:19:06 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620375#comment-14620375
] 

Aleksey Yeschenko edited comment on CASSANDRA-6717 at 7/9/15 12:18 PM:
-----------------------------------------------------------------------

bq. typo in javadoc: infromation

Fixed.

bq. migrate(): should log when migration has completed successfully

Fixed.

bq. This comment should be updated (or maybe just removed): "Note that because Keyspaces is
a COMPACT TABLE, we're really only setting static columns internally and shouldn't set any
clustering"

Fixed.

bq. In parseAggregateFunctionName(), there's the following comment: // function name can be
abbreviated (pre 2.2rc2) - it is in the same keyspace as the aggregate. It sounds like we
don't allow that any more, so we don't need to handle it.

If we drop it here, we should drop it from 2.2 as well - depends on whether or not we care
about 2.2.0 being compatible with rc1-created aggregates. That said, it doesn't hurt to have
it there, and it's already gone from {{SchemaKeyspace}}.

bq. serializeKind() is a dupe of SchemaKeyspaces.deserializeKind()
bq. decodeTableMetadata() is a near dupe of SchemaKeyspaces.createTableFromTableRowAndColumnRows()
– can we unify those somewhat?

We can, but there is zero point in doing so. With the next few commits they will diverge,
as the new {{system_schema.tables}} tables will take on an entirely different structure.

bq. Perhaps rename LegacySchemaTables to LEGACY_SCHEMA_TABLES? (Looks like a class name, currently.)

That's the old convention w/ these things, I don't know how it started, but for tables names
we use {{TABLE_NAMES}} in constants, and for {{CFMetaData}} collections we use {{TableNames}}.
I'm just being consistent.

bq. decodeTableMetadata(): when would cf_id not be present?

I think after 2.1 startup upgrade it should never be the case, but it doesn't hurt to leave
it there either. I did remove it from {{SchemaKeyspace.createTableFromTableRowAndColumnRows()}}
though

bq. checkNeedsUpgrade(): the comment explaining the check for the "supercolumn map" isn't
immediately clear – maybe point to the CompactTables javadoc or explicitly mention that
the supercolumn map is the only way column_name can be empty

This is just a copy-paste of previous Sylvain's code. Some of it will be gone soon - in {{SchemaKeyspace}},
but some will remain in the migrate code.

bq. Migrating tables in general

That *should* be covered by {{SchemaLoader.schemaDefinition()}}. Some of the tables there
are currently commented out though, for one reason or the other.

bq. New schema tables are written to with the correct timestamp

Planned for this initially, but won't do it until I'm done with the ticket. The tables will
be different, and so will be the code to check for timestamps - doing it at this point is
somewhat wasteful.

bq. Legacy schema tables don't exist in new schema tables

Fixed.

The rest of the tests - yes. Will add later, but before closing the ticket, if you don't mind
- I need this, and more code depending on this, in trunk, soon.

bq. You added a call to cfs.indexManager.setIndexRemoved() in dropTable(). Why do we need
that there now?

We used to add deletion to drop table/drop keyspace mutations - see https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/LegacySchemaTables.java#L917
and https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/LegacySchemaTables.java#L676
. It's no longer feasible now that schema tables are not in the same keyspace as {{IndexInfo}}
table, so I'm doing this instead - which is the same path that we use when we drop an individual
index on a column, or drop a column.


was (Author: iamaleksey):
bq. typo in javadoc: infromation

Fixed.

bq. migrate(): should log when migration has completed successfully

Fixed.

bq. This comment should be updated (or maybe just removed): "Note that because Keyspaces is
a COMPACT TABLE, we're really only setting static columns internally and shouldn't set any
clustering"

Fixed.

bq. In parseAggregateFunctionName(), there's the following comment: // function name can be
abbreviated (pre 2.2rc2) - it is in the same keyspace as the aggregate. It sounds like we
don't allow that any more, so we don't need to handle it.

If we drop it here, we should drop it from 2.2 as well - depends on whether or not we care
about 2.2.0 being compatible with rc1-created aggregates. That said, it doesn't hurt to have
it there, and it's already gone from {{SchemaKeyspace}}.

bq. serializeKind() is a dupe of SchemaKeyspaces.deserializeKind()
bq. decodeTableMetadata() is a near dupe of SchemaKeyspaces.createTableFromTableRowAndColumnRows()
– can we unify those somewhat?

We can, but there is zero point in doing so. With the next few commits they will diverge,
as the new {{system_schema.tables}} tables will take on an entirely different structure.

bq. Perhaps rename LegacySchemaTables to LEGACY_SCHEMA_TABLES? (Looks like a class name, currently.)

That's the old convention w/ these things, I don't know how it started, but for tables names
we use {{TABLE_NAMES}} in constants, and for {{CFMetaData}} collections we use {{TableNames}}.
I'm just being consistent.

bq. decodeTableMetadata(): when would cf_id not be present?

I think after 2.1 startup upgrade it should never be the case, but it doesn't hurt to leave
it there either. I did remove it from {{SchemaKeyspace.createTableFromTableRowAndColumnRows()}}
though

bq. checkNeedsUpgrade(): the comment explaining the check for the "supercolumn map" isn't
immediately clear – maybe point to the CompactTables javadoc or explicitly mention that
the supercolumn map is the only way column_name can be empty

This is just a copy-paste of previous Sylvain's code. Some of it will be gone soon - in {{SchemaKeyspace}},
but some will remain in the migrate code.

bq. Migrating tables in general

That *should* be covered by {{SchemaLoader.schemaDefinition()}}. Some of the tables there
are currently commented out though, for one reason or the other.

bq. New schema tables are written to with the correct timestamp

Planned for this initially, but won't do it until I'm done with the ticket. The tables will
be different, and so will be the code to check for timestamps - doing it at this point is
somewhat wasteful.

bq. Legacy schema tables don't exist in new schema tables

Will do.

The rest of the tests - yes. Will add later, but before closing the ticket, if you don't mind
- I need this, and more code depending on this, in trunk, soon.

bq. You added a call to cfs.indexManager.setIndexRemoved() in dropTable(). Why do we need
that there now?

We used to add deletion to drop table/drop keyspace mutations - see https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/LegacySchemaTables.java#L917
and https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/LegacySchemaTables.java#L676
. It's no longer feasible now that schema tables are not in the same keyspace as {{IndexInfo}}
table, so I'm doing this instead - which is the same path that we use when we drop an individual
index on a column, or drop a column.

> Modernize schema tables
> -----------------------
>
>                 Key: CASSANDRA-6717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6717
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Sylvain Lebresne
>            Assignee: Aleksey Yeschenko
>              Labels: client-impacting
>             Fix For: 3.0 beta 1
>
>
> There is a few problems/improvements that can be done with the way we store schema:
> # CASSANDRA-4988: as explained on the ticket, storing the comparator is now redundant
(or almost, we'd need to store whether the table is COMPACT or not too, which we don't currently
is easy and probably a good idea anyway), it can be entirely reconstructed from the infos
in schema_columns (the same is true of key_validator and subcomparator, and replacing default_validator
by a COMPACT_VALUE column in all case is relatively simple). And storing the comparator as
an opaque string broke concurrent updates of sub-part of said comparator (concurrent collection
addition or altering 2 separate clustering columns typically) so it's really worth removing
it.
> # CASSANDRA-4603: it's time to get rid of those ugly json maps. I'll note that schema_keyspaces
is a problem due to its use of COMPACT STORAGE, but I think we should fix it once and for-all
nonetheless (see below).
> # For CASSANDRA-6382 and to allow indexing both map keys and values at the same time,
we'd need to be able to have more than one index definition for a given column.
> # There is a few mismatches in table options between the one stored in the schema and
the one used when declaring/altering a table which would be nice to fix. The compaction, compression
and replication maps are one already mentioned from CASSANDRA-4603, but also for some reason
'dclocal_read_repair_chance' in CQL is called just 'local_read_repair_chance' in the schema
table, and 'min/max_compaction_threshold' are column families option in the schema but just
compaction options for CQL (which makes more sense).
> None of those issues are major, and we could probably deal with them independently but
it might be simpler to just fix them all in one shot so I wanted to sum them all up here.
In particular, the fact that 'schema_keyspaces' uses COMPACT STORAGE is annoying (for the
replication map, but it may limit future stuff too) which suggest we should migrate it to
a new, non COMPACT table. And while that's arguably a detail, it wouldn't hurt to rename schema_columnfamilies
to schema_tables for the years to come since that's the prefered vernacular for CQL.
> Overall, what I would suggest is to move all schema tables to a new keyspace, named 'schema'
for instance (or 'system_schema' but I prefer the shorter version), and fix all the issues
above at once. Since we currently don't exchange schema between nodes of different versions,
all we'd need to do that is a one shot startup migration, and overall, I think it could be
simpler for clients to deal with one clear migration than to have to handle minor individual
changes all over the place. I also think it's somewhat cleaner conceptually to have schema
tables in their own keyspace since they are replicated through a different mechanism than
other system tables.
> If we do that, we could, for instance, migrate to the following schema tables (details
up for discussion of course):
> {noformat}
> CREATE TYPE user_type (
>   name text,
>   column_names list<text>,
>   column_types list<text>
> )
> CREATE TABLE keyspaces (
>   name text PRIMARY KEY,
>   durable_writes boolean,
>   replication map<string, string>,
>   user_types map<string, user_type>
> )
> CREATE TYPE trigger_definition (
>   name text,
>   options map<tex, text>
> )
> CREATE TABLE tables (
>   keyspace text,
>   name text,
>   id uuid,
>   table_type text, // COMPACT, CQL or SUPER
>   dropped_columns map<text, bigint>,
>   triggers map<text, trigger_definition>,
>   // options
>   comment text,
>   compaction map<text, text>,
>   compression map<text, text>,
>   read_repair_chance double,
>   dclocal_read_repair_chance double,
>   gc_grace_seconds int,
>   caching text,
>   rows_per_partition_to_cache text,
>   default_time_to_live int,
>   min_index_interval int,
>   max_index_interval int,
>   speculative_retry text,
>   populate_io_cache_on_flush boolean,
>   bloom_filter_fp_chance double
>   memtable_flush_period_in_ms int,
>   PRIMARY KEY (keyspace, name)
> )
> CREATE TYPE index_definition (
>   name text,
>   index_type text,
>   options map<text, text>
> )
> CREATE TABLE columns (
>   keyspace text,
>   table text,
>   name text,
>   kind text, // PARTITION_KEY, CLUSTERING_COLUMN, REGULAR or COMPACT_VALUE
>   component_index int;
>   type text,
>   indexes map<text, index_definition>,
>   PRIMARY KEY (keyspace, table, name)
> )
> {noformat}
> Nit: wouldn't hurt to create a simple enum that is reuse by both CFMetaData and CFPropDefs
for table options names while we're at it once they are the same instead of repeating string
constants which is fragile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message