From user-return-30694-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Wed Dec 19 22:31:12 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0C00BD259 for ; Wed, 19 Dec 2012 22:31:12 +0000 (UTC) Received: (qmail 18027 invoked by uid 500); 19 Dec 2012 22:31:09 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 18000 invoked by uid 500); 19 Dec 2012 22:31:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 17992 invoked by uid 99); 19 Dec 2012 22:31:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Dec 2012 22:31:09 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of arodrime@gmail.com designates 209.85.212.41 as permitted sender) Received: from [209.85.212.41] (HELO mail-vb0-f41.google.com) (209.85.212.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Dec 2012 22:31:03 +0000 Received: by mail-vb0-f41.google.com with SMTP id l22so2937992vbn.14 for ; Wed, 19 Dec 2012 14:30:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=vwEkyPDeqiB7x/JWVpXefAwzBn6FfK3ClHZndYB4h/U=; b=SAFsaRRpLx43gwjNIWO/oP0rUImJeVvHjgDmH83W8SfzgW77uD4B1xXVe2ZpQj9jJu z3QiUb2QRlDLODmgmQ6NMOOBgVmjnNEYxZUru+zGnq19CqTsAhJDVbyAibuwg6c6nk6Y 20TXkEscFh2VB5jeClAmTIdxFlDtDK3sFr8jd2rAhp96RRpNK0N3pS+enfphNZXLlP4M WL3F7KUzW1Q+q0+hzvr54k9GU+RvaS6rUlavvJXJoL843tQGuRSqSLxJjigM3JMqCqLC NCQeqPJytHmW2cXFYPoPM/no3rnJyfTEnL6KOaMHXEk9qpH93tRAVr0SnCdIjmGlaatM bZoA== Received: by 10.220.150.136 with SMTP id y8mr11049006vcv.34.1355956242476; Wed, 19 Dec 2012 14:30:42 -0800 (PST) MIME-Version: 1.0 Received: by 10.220.249.3 with HTTP; Wed, 19 Dec 2012 14:30:22 -0800 (PST) In-Reply-To: <9EE593D3-2A84-4E3E-A76A-B8CD25AF4E97@thelastpickle.com> References: <9EE593D3-2A84-4E3E-A76A-B8CD25AF4E97@thelastpickle.com> From: Alain RODRIGUEZ Date: Wed, 19 Dec 2012 23:30:22 +0100 Message-ID: Subject: Re: High disk read throughput on only one node. To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d043d64b329ef2404d13c2a6b X-Virus-Checked: Checked by ClamAV on apache.org --f46d043d64b329ef2404d13c2a6b Content-Type: text/plain; charset=ISO-8859-1 @Aaron "Is there a sustained difference or did it settle back ? " Sustained, clearly. During the day all nodes read at about 6MB/s while this one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken" nodes reads at 8-10MB/s "Could this have been compaction or repair or upgrade tables working ? " Was my first thought but definitely no. this occurs continuously. "Do the read / write counts available in nodetool cfstats show anything different ? " The cfstats shows different counts (a lot less reads/writes for the "bad" node) but they didn't join the ring at the same time. I join you the cfstats just in case it could help somehow. Node 38: http://pastebin.com/ViS1MR8d (bad one) Node 32: http://pastebin.com/MrSTHH9F Node 154: http://pastebin.com/7p0Usvwd @Bryan "clients always connect to that server" I didn't join it in the screenshot from AWS console, but AWS report an (almost) equal network within the nodes (same for output and cpu). The cpu load is a lot higher in the broken node as shown by the OpsCenter, but that's due to the high iowait...) --f46d043d64b329ef2404d13c2a6b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
@Aaron
"Is = there a sustained difference or did it settle back ? "

Sustained, clearly. During the day all nodes read at about 6MB/s while = this one reads at 30-40 MB/s. At night while other reads 2MB/s the "br= oken" nodes reads at 8-10MB/s

"Could this have been= compaction or repair or upgrade tables working ? "

Was my first = thought but definitely no. this occurs=A0continuously.

"Do the read / write counts available in no= detool cfstats show anything different ? "

The cfs= tats shows different counts (a lot less reads/writes for the "bad"= ; node) =A0but they didn't join the ring at the same time. I join you t= he cfstats just in case it could help somehow.

N= ode =A038:=A0http://pastebin.com/V= iS1MR8d (bad one)

@Bryan<= /font>

"clients alw= ays connect to that server"

I didn't join it in the=A0screenshot from AWS console, but AWS repo= rt an (almost) equal network within the nodes (same for output and cpu). Th= e cpu load is a lot higher in the broken node as shown by the OpsCenter, bu= t that's due to the high iowait...)
--f46d043d64b329ef2404d13c2a6b--