cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3533) TimeoutException when there is a firewall issue.
Date Tue, 31 Jul 2012 00:11:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425410#comment-13425410
] 

Brandon Williams commented on CASSANDRA-3533:
---------------------------------------------

bq. Is there anything forcing a next attempt though, besides gossip (1/N chance per round)?

Hmm, actually, no, I was mistaken there.

bq. But you still have things like GC-based "flapping" that can cause FD to mark a node down
over-pessimistically. So I don't think I buy that this is an argument for not making FD more
robust – since we already have to deal with "FD is too pessimistic" for this case.

I actually don't think, at least for this example, being overly pessimistic is an issue. 
On a healthy network (0.3ms ping) it takes 18-19s for the FD to mark a host down with the
default phi.  If the GC flapping is so bad it can't get a gossip change out in that time,
the node probably _should_ be marked down.

bq. (Fundamentally though I don't think we'll get much mileage out of trying to second-guess
FD, so I'd rather make FD as accurate as we can. And I suspect that "StorageProxy uses FD-supplemented-by-X
and the rest of the system using normal FD is going to cause weirdness.)

You're probably right.  Let's take a step back and examine what we're trying to solve.  Node
X can talk to Y, Y can talk to Z, but X and Z are partitioned and can't communicate, but surrogate
gossip traffic via Y makes them both think they can.  The fallout from this is that they'll
keep attempting to send messages (and thus connect) to each other.  In practice though, from
a client perspective:

* writes will get ack'd by whichever replicas respond the fastest.  Assuming RF=3 and X being
the coordinator, the fact that it wrote a local copy and Y responded is enough for everything
but ALL.

* reads will get attempted against Z from X, and will have to timeout.

Now let's look at the read scenario in a post-1.2 world.  The dsnitch, after CASSANDRA-3722,
will penalize Z in X's eyes much faster (and thus prevent dogpiling requests while waiting
for rpc timeout) than pre-1.2 and quit trying to use it (at least until the reset interval,
then the process begins again.)  But this is really no different than if Z _does_ suddenly
die at such a level that the network route is a black hole (like force suspending the JVM,
which is how the dsnitch change was tested and worked well.)

So I suppose my question is, what is the problem here we still need to solve?
                
> TimeoutException when there is a firewall issue.
> ------------------------------------------------
>
>                 Key: CASSANDRA-3533
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3533
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Vijay
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 3533.txt
>
>
> When one node in the cluster is not able to talk to the other DC/RAC due to firewall
or network related issue (StorageProxy calls fail), and the nodes are NOT marked down because
at least one node in the cluster can talk to the other DC/RAC, we get timeoutException instead
of throwing a unavailableException.
> The problem with this:
> 1) It is hard to monitor/identify these errors.
> 2) It is hard to diffrentiate from the client if the node being bad vs a bad query.
> 3) when this issue happens we have to wait for at-least the RPC timeout time to know
that the query wont succeed.
> Possible Solution: when marking a node down we might want to check if the node is actually
alive by trying to communicate to it? So we can be sure that the node is actually alive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message