Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 22E12108CD for ; Mon, 2 Dec 2013 23:00:08 +0000 (UTC) Received: (qmail 27565 invoked by uid 500); 2 Dec 2013 23:00:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 27513 invoked by uid 500); 2 Dec 2013 23:00:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 27505 invoked by uid 99); 2 Dec 2013 23:00:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Dec 2013 23:00:05 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of john.pyeatt@singlewire.com designates 74.125.149.151 as permitted sender) Received: from [74.125.149.151] (HELO na3sys009aog124.obsmtp.com) (74.125.149.151) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 02 Dec 2013 23:00:00 +0000 Received: from mail-pa0-f53.google.com ([209.85.220.53]) (using TLSv1) by na3sys009aob124.postini.com ([74.125.148.12]) with SMTP ID DSNKUp0Q27XcJcsx8P2HDzMtU18EYw+iILgY@postini.com; Mon, 02 Dec 2013 14:59:40 PST Received: by mail-pa0-f53.google.com with SMTP id hz1so2031454pad.12 for ; Mon, 02 Dec 2013 14:59:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=9uIMs0H2f0bKxjgTV/MmB6pzmMjcBT4rfdQ0YpEt6b0=; b=XOkVPnenLDsiSLhPqqvOvohTWu+vdU4BUBnGr0aqJOEezLUJiNBlDJqh+2BXU2zISq 4xCuJ+05NUbzBUtRdAzksvIp2a2vpYVmptWGX2rdKhcsGzfJRVnLiYCcsmK1oj3miwNu KcExWG4FIFq43gMuymZrtksTDumXqC/irRnBBRbCBcECsURRkcazkiMLleo1TKtZNKCx +KRpluTj/UW9qzwEyab4lVoVpgWeDd98l/BwJAfflPDxQVAInFoAv8/8Cxx5waLqCjsl c36vA5XSzK1Qdto9RhJKmGk7uVkGaXuUWBUV47D104A/B839FWvz/U7m6st/RElCnLrg fQQA== X-Gm-Message-State: ALoCoQkcHRElDDSoe3P69vNlrA7X7F/GeezXcII0Kymsy4f5GM/K4PZmKrI992slpFvtd0iT2WZmusIh5qDWouOY9oxUMvGPNz6z/F2siZsJEOYy+g3X6G8cRX3qYv9wckPhCGfvp79MLo0/pQe0gGWT3DPSs8GwE2zXSupQUwehS7V0Ath11iM= X-Received: by 10.66.26.106 with SMTP id k10mr11137074pag.136.1386025179059; Mon, 02 Dec 2013 14:59:39 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.66.26.106 with SMTP id k10mr11137042pag.136.1386025178642; Mon, 02 Dec 2013 14:59:38 -0800 (PST) Received: by 10.70.93.102 with HTTP; Mon, 2 Dec 2013 14:59:38 -0800 (PST) Date: Mon, 2 Dec 2013 16:59:38 -0600 Message-ID: Subject: Stack trace from a node during a repair From: John Pyeatt To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=bcaec520f2e36c752f04ec952273 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec520f2e36c752f04ec952273 Content-Type: text/plain; charset=ISO-8859-1 We are running a 6-node AWS EC2 (m1.large) cluster of cassandra 1.2.9 across three availability zones with Ec2Snitch and NetworkTopologyStrategy. One of our nodes was apparently sharing a physical box with another customer who was really hogging the IO. So we needed to bring the node up on a new ec2 instance. We decommissioned the offending node, killed the instance and brought a new instance into the cluster. Everything went fine so far. After it came up I ran a nodetool repair -pr on each of the nodes in the cluster. I ran these sequentially. When it got to doing the repair on the new node three times the gossip service shut down. At the bottom of this email is a copy of the stack trace we received. It says it couldn't create a backups directory. I have no idea why this would be the /data-1 partition is 400Gb in size and currently 1% utilized. Does anyone have any idea what could be causing this? my /etc/security/limits.conf file currently has # resource settings added based on # http://www.datastax.com/docs/1.2/install/recommended_settings * soft nofile 65536 * hard nofile 65536 root soft nofile 65536 root hard nofile 65536 * soft memlock unlimited * hard memlock unlimited root soft memlock unlimited root hard memlock unlimited * soft as unlimited * hard as unlimited root soft as unlimited root hard as unlimited ERROR 2013-12-02 21:02:25,711 [Thread-3050] CassandraDaemon Exception in thread Thread[Thread-3050,5,main] FSWriteError in /data-1/cassandra/data/SinglewireSupport/Binaries/backups at org.apache.cassandra.db.Directories.getOrCreate(Directories.java:483) at org.apache.cassandra.db.Directories.getBackupsDirectory(Directories.java:242) at org.apache.cassandra.db.DataTracker.maybeIncrementallyBackup(DataTracker.java:165) at org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237) at org.apache.cassandra.db.ColumnFamilyStore.addSSTables(ColumnFamilyStore.java:911) at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:186) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:138) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238) at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78) Caused by: java.io.IOException: Unable to create directory /data-1/cassandra/data/SinglewireSupport/Binaries/backups -- John Pyeatt Singlewire Software, LLC www.singlewire.com ------------------ 608.661.1184 john.pyeatt@singlewire.com --bcaec520f2e36c752f04ec952273 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
We are running a 6-node AWS EC2 (= m1.large) cluster of cassandra 1.2.9 across three availability zones with E= c2Snitch and NetworkTopologyStrategy.

One of our nodes was app= arently sharing a physical box with another customer who was really hogging= the IO. So we needed to bring the node up on a new ec2 instance.

We decommissioned the offending node, killed the instance and bro= ught a new instance into the cluster. Everything went fine so far.

A= fter it came up I ran a nodetool repair -pr on each of the nodes in the clu= ster. I ran these sequentially. When it got to doing the repair on the new = node three times the gossip service shut down. At the bottom of this email = is a copy of the stack trace we received.

It says it couldn't create a backups directory. I have no ide= a why this would be the /data-1 partition is 400Gb in size and currently 1%= utilized. Does anyone have any idea what could be causing this?

my /etc/security/limits.conf file currently has
# resource settings added = based on
# http://www.datastax.com/docs/1.2/install/recommended_settings<= /a>
* soft nofile 65536
* hard nofile 65536
root soft nofile 65536
roo= t hard nofile 65536
* soft memlock unlimited
* hard memlock unlimited=
root soft memlock unlimited
root hard memlock unlimited
* soft as= unlimited
* hard as unlimited
root soft as unlimited
root hard as unlimited




E= RROR 2013-12-02 21:02:25,711 [Thread-3050] CassandraDaemon Exception in thr= ead Thread[Thread-3050,5,main]
FSWriteError in /data-1/cassandra/data/SinglewireSupport/Binaries/backups
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.db.Directories.getOr= Create(Directories.java:483)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassand= ra.db.Directories.getBackupsDirectory(Directories.java:242)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.db.DataTracker.maybeIncrement= allyBackup(DataTracker.java:165)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cas= sandra.db.DataTracker.addSSTables(DataTracker.java:237)
=A0=A0=A0=A0=A0= =A0=A0 at org.apache.cassandra.db.ColumnFamilyStore.addSSTables(ColumnFamil= yStore.java:911)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.streaming.StreamInSession.clo= seIfFinished(StreamInSession.java:186)
=A0=A0=A0=A0=A0=A0=A0 at org.apac= he.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:= 138)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.net.IncomingTcpConnec= tion.stream(IncomingTcpConnection.java:238)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.net.IncomingTcpConnection.han= dleStream(IncomingTcpConnection.java:178)
=A0=A0=A0=A0=A0=A0=A0 at org.a= pache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78= )
Caused by: java.io.IOException: Una= ble to create directory /data-1/cassandra/data/SinglewireSupport/Binaries/b= ackups


--bcaec520f2e36c752f04ec952273--