cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " Brian Hess (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7622) Implement virtual tables
Date Mon, 06 Jun 2016 17:54:21 GMT


 Brian Hess commented on CASSANDRA-7622:

It appears there are a few different things that could be accomplished with Virtual Tables,
and I'm curious which one this ticket will focus on - and I'd suggest that we consider breaking
out the full set of things below into separate tickets as I think some deserve more discussion/debate
than others.
1. CQL access to internal metrics and management.  This would be access to metrics (such as
JMX) and management settings (such as cassandra.yaml settings).  These could be covered by
read-only access to built-in virtual tables (update/write access is below).
2. CQL access to updating management settings.  This is different than management actions
(such as `nodetool flush` or repair).
3. CQL access to management actions.  This is the category of things such as `nodetool flush`,
repair, drain, etc.
4. CQL access to arbitrary classes that implement user-defined behavior.

I think that 1 is the least controversial, and clearly of high value.  My suggestion is that
this be built-in Virtual Tables, perhaps in system._vt_*.  For example system._vt_metrics
as SELECT access to the JMX metrics, or system._vt_settings as SELECT access to yaml/command-line/etc
settings.  These would be predefined virtual tables.  I am concerned about how we will indicate
how we specify which node's metrics we are querying, and how we route the query to that node
(more on this below - it applies to all 4 (and more if we come up with more) scenarios).

I also think that 2 is not very controversial.  This would allow updating settings.  That
could be, for example via a CQL UPDATE command (e.g., `UPDATE system._vt_settings SET max_hint_window_in_ms
= 10800000 WHERE node=''` - for illustrative purposes).  The reason why this is different
than 3 is that after doing the update, a read will return the same value, since it was changed.
 Would we want to support `UPDATE system._vt_settings SET max_hint_window_in_ms = 10800000`
and update all nodes, which is not valid CQL since we don't specify a partition key?  

I'm not sure about 3, and I think it should be discussed further.  The thing is that something
like `ALTER system_virtual.actions SET drain=true` or really `UPDATE system_virtual.actions
SET drain=true` what would the value be if we SELECT it after the ALTER/UPDATE?  It feels
like it would be set back to FALSE after the drain, which semantically feels strange.  I like
[~snazy]'s suggestion of completely separate CQL syntax for these actions.  Maybe something
like `EXECUTE ACTION drain ON ''` - again, syntax just for illustrative purposes. 
The semantics here is that the system is being altered, but outside a table, so a SELECT doesn't
have the confusion as above.  As [~snazy] pointed out, we would also need to make sure we
are careful about access to this capability (perhaps user/role permissions is sufficient,
and we just need to add new permission types).

4 is probably the one that causes the most pause, IMO, as there would be arbitrary user code
running in the read and write path.  We need to take care with that - as we have with UDFs.
 It isn't exactly clear to me what the goal is for arbitrary virtual tables.  What are we
trying to accomplish with this extension point?

As I mentioned above, I'm curious how we will route the query (SELECT, UPDATE, EXECUTE, whatever)
to the right node.  In the simple case of SELECT of JMX metrics, how would we query the JMX
metrics of node  These queries/operations are per-node, not cluster-wide, so we need
to think about that.  Is that the partition key of the virtual table (`.... PARTITION BY ((node),
...)`)?  How would we route that internally in CQL (and with awareness on the driver)?  

It sounds like it would require a special partitioner (like system tables have - or maybe
ByteOrderedPartitioner could work with a customer replication strategy, but with Murmur3Partitioner
things are hard (it's essentially hash inversion)) and possibly replication strategy (I don't
think that a replication strategy alone would cover this, right [~jjordan]).  If these are
fully user-defined (as [~jjirsa] suggests), then we need to have a way to specify partitioners
at the table (or at least keyspace) level, as opposed to cluster wide (which would actually
be a nice change, IMO).  If they are built-in, then we can cover this case like we do with
system keyspace tables.

Another approach may be to have the drivers deal with the routing.  The Virtual Tables could
be special cases for routing purposes and have an API call to specify which node to query
- or the LoadBalancingPolicy could inspect the Statement and route appropriately, which could
work for any/all partitioners and/or replication strategies.  If it is a LoadBalancingPolicy,
then what happens if we use the "wrong" one (not the VirtualTableLoadBalancingPolicy, say)
for a Virtual Table query - I guess a new Error type.

> Implement virtual tables
> ------------------------
>                 Key: CASSANDRA-7622
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Tupshin Harper
>            Assignee: Jeff Jirsa
>             Fix For: 3.x
> There are a variety of reasons to want virtual tables, which would be any table that
would be backed by an API, rather than data explicitly managed and stored as sstables.
> One possible use case would be to expose JMX data through CQL as a resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml configuration
information. So it would be an alternate approach to CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not presupposing.

This message was sent by Atlassian JIRA

View raw message