cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance
Date Mon, 28 Mar 2011 19:22:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012187#comment-13012187
] 

Jonathan Ellis commented on CASSANDRA-2394:
-------------------------------------------

bq. until the snitch on all coordinators decided to quit using the node

but shouldn't that be negligibly slower than in a small cluster, assuming there is enough
query volume that each coordinator is routing some queries for the data in question?

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.5
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing
the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes
which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes
never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node
as "down" if there are multiple read/write errors to the data directories. A single faulty
hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message