cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Hobbs <ty...@datastax.com>
Subject Re: Using Per-Table Keyspaces for Tunable Replication
Date Fri, 12 Dec 2014 23:20:53 GMT
On Fri, Dec 12, 2014 at 4:50 PM, Eric Stevens <mightye@gmail.com> wrote:

>
> I know that Thrift includes keyspace as part of the connection details, so
> if you're reading or writing to many keyspaces, you'll end up having to
> make a lot of additional round trips, and it will hurt your throughput.  I
> may be wrong, but I don't think this is true for the native protocol.  If
> we're using fully qualified names for all of our queries, I don't think
> this incurs the same overhead.
>

That's correct.  While you can set a default keyspace for a native protocol
connection, the ability to use fully qualified names makes this not matter
in the same way that it did for Thrift.


>
> I've had a look through the DataStax Java Driver's execution path and I'm
> seeing that it attempts to discover the keyspace used by each query, but
> that's to help determine the candidate hosts for token aware policy.  It
> does that discovery at the time the session is initted (see Metadata.java
> <http://grepcode.com/file/repo1.maven.org/maven2/com.datastax.cassandra/cassandra-driver-core/2.1.2/com/datastax/driver/core/Metadata.java/#381>)
> as well as when a topology change is detected, so it seems like it may
> slightly slow down connect time, but the cost per query at execution time
> should be relatively static regardless of the number of keyspaces.
>

This is also correct.  On startup the driver will build a token ring (or
replica map) representation for each keyspace to assist TokenAwarePolicy.
There's no additional overhead per-query for extra keyspaces.


>
> I know there is nontrivial overhead for each column family, but I have not
> read or heard that there is nontrivial overhead for each keyspace.  Do you
> have more information about that?
>

The overhead for each keyspace is minor.  There will be some additional
objects in the heap, some more entries in the system tables, and the driver
will generally track more metadata, but that's all pretty lightweight.

The per-column family overhead primarily comes from the way memory is
allocated for memtables.  However, CASSANDRA-7882 should significantly
improve that: https://issues.apache.org/jira/browse/CASSANDRA-7882

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Mime
View raw message