cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thibaut (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-2394) Faulty hd kills cluster performance
Date Sun, 27 Mar 2011 21:24:05 GMT
Faulty hd kills cluster performance
-----------------------------------

                 Key: CASSANDRA-2394
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.7.4
            Reporter: Thibaut
            Priority: Minor


Hi,

About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the
cassandra data storage directoy triggers an input/output error).

Whenever this occurs, I see many timeoutexceptions in our application on various nodes which
cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never
succeed. If I stop cassandra on the faulty node, everything runs normal again.

It would be great to have some kind of monitoring thread in cassandra which marks a node as
"down" if there are multiple read/write errors to the data directories. A single faulty hd
on 1 node shouldn't affect global cluster performance.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message