cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mck SembWever (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1831) NetworkTopologyStrategy allows mismatched RF resulting in obscure failures
Date Sat, 22 Jan 2011 16:44:44 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985152#action_12985152
] 

Mck SembWever commented on CASSANDRA-1831:
------------------------------------------

I hit this issue. It wasn't immediately obvious to me what was required configuration to get
NetworkTopologyStrategy working. The best docs i can find is in http://svn.apache.org/repos/asf/cassandra/trunk/conf/cassandra.yaml
A wiki page on NetworkTopologyStrategy would help a lot.

(http://wiki.apache.org/cassandra/Operations#Network_topology still refers to the old RackAwareStrategy)

> NetworkTopologyStrategy allows mismatched RF resulting in obscure failures
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1831
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1831
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.0 rc 1
>            Reporter: Peter Schuller
>
> On today's 0.7 branch:
> Creating a keyspace like this (not how to do it in production, but that's not the point):
>    create keyspace MyKeySpace with replication_factor = 2 and placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy';
> This is accepted by Cassandra in spite of there being no strategy options. Describing
the keyspace will then give output similar to:
> Keyspace: MyKeySpace:
>  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
> null
> Attempts to write and read respectively gives the errors included at the bottom of this
comment.
> What happens is that the NTS's getReplicationFactor() returns the sum of RF for each
DC. But lacking any replicate placement options for DC:s, the sum will always be 0. The result
is that NTS.calculateNaturalEndpoints() yields 0 endpoints thus triggering the assertion failures
apparent in the strack traces.
> This was caused by misconfiguration during testing but should be handled better. What
are people's thoughts on the set of changes that would constitute a proper fix?
> Is there a reason for NTS to ever conclude that RF is different than that of the CF def?
If not, I would say that one fix is to make the NTS bail early if the calculated RF adding
up the DC placements does not match the configured RF for the column family. (I'll submit
a patch if people agree.)
> Beyond that, what else, if anything should be done? Should the creation fail due to the
RF being inconsistent with strategy options? Is it correct that code assumes that naturalEndPoints
will never return fewer nodes than RF? It seems natural to me that the natural endpoint count
should always match RF, unless the total number of nodes in the cluster is lacking. But this
gets complicated with NTS since the requirement is suddenly that you have enough in each DC.
This probably relates to previous discussions on whether or not to allow an RF which is higher
than the number of nodes in a cluster.
> In this case, we failed hard because we got exactly 0 endpoints and triggered assertions.
In other cases we might have gotten say 1, in which case we may have successfully been able
to read and write as if we had a lower RF even though the column family RF was set to 2. This
seems dangerous.
> ERROR [pool-1-thread-2] 2010-12-07 11:18:40,638 Cassandra.java (line
> 3044) Internal error processing batch_mutate
> java.lang.AssertionError: invalid response count 1 for replication factor 0
>        at org.apache.cassandra.service.WriteResponseHandler.determineBlockFor(WriteResponseHandler.java:98)
>        at org.apache.cassandra.service.WriteResponseHandler.<init>(WriteResponseHandler.java:48)
>        at org.apache.cassandra.service.WriteResponseHandler.create(WriteResponseHandler.java:61)
>        at org.apache.cassandra.locator.AbstractReplicationStrategy.getWriteResponseHandler(AbstractReplicationStrategy.java:125)
>        at org.apache.cassandra.locator.NetworkTopologyStrategy.getWriteResponseHandler(NetworkTopologyStrategy.java:166)
>        at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:114)
>        at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:446)
>        at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:419)
>        at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3036)
>        at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
>        at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> ERROR [pool-1-thread-3] 2010-12-07 11:18:50,474 Cassandra.java (line
> 2876) Internal error processing get_range_slices
> java.lang.AssertionError
>        at org.apache.cassandra.service.RangeSliceResponseResolver.<init>(RangeSliceResponseResolver.java:53)
>        at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:450)
>        at org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:507)
>        at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:2868)
>        at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
>        at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
>  INFO [MigrationStage:1] 2010-12-07 11:24:09,220

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message