cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8519) Mechanism to determine which nodes own which token ranges without Thrift
Date Fri, 19 Dec 2014 16:48:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253628#comment-14253628
] 

Sylvain Lebresne commented on CASSANDRA-8519:
---------------------------------------------

Just to clarify, the information to compute this *is* available through CQL since you have
access to tokens and the replication strategy. And in fact, the java driver (since it was
mentioned) does do it already, it just don't expose the token range yet but that will be fixed
by https://datastax-oss.atlassian.net/browse/JAVA-312.

Now, we can provide the token range pre-computed to 1) save every driver from having to compute
it and 2) make it so that said drivers don't need to be updated if we ever add a new replication
strategy. But since 1) it's not terribly hard for driver to do it (and I say that as the one
that did it for the java driver) and 2) we're far from release new replication strategies
every other day, I'm going to mark that as low priority.

> Mechanism to determine which nodes own which token ranges without Thrift
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8519
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8519
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter:  Brian Hess
>
> Right now the only way to determine which nodes own which token ranges is via the Thrift
interface.  There is not a Java/CQL driver mechanism to determine this.  Applications that
make multiple connections to Cassandra to extract data in parallel need this ability so they
can split the data into pieces, and it is reasonable to want those splits to be on token range
boundaries.  Of course, once you split this way, you would want to route those queries to
nodes that own that token range / split, for efficiency.
> This applies for both Hadoop and Spark, but other applications, too.  Hadoop and Spark
currently use Thrift to determine this topology.
> Additionally, different replication strategies and replication factors result in different
token range ownership, so there will have to be a different answer based on which keyspace
is used. 
> It would be useful if this data was stored in a CQL table and could be simply queried.
 A suggestion would be to add a column to the SYSTEM.SCHEMA_KEYSPACES table (maybe a complex
Map of Host to a UDT that has a List of (beginRange, endRange) pairs - as an example).  This
table would need to be updated on an ALTER KEYSPACE command or on a topology change event.
 This would allow the server(s) to hold this information and the drivers could simply query
it (as opposed to having each driver manage this separately).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message