Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70EF0D5BF for ; Tue, 25 Sep 2012 01:06:53 +0000 (UTC) Received: (qmail 23557 invoked by uid 500); 25 Sep 2012 01:06:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 23541 invoked by uid 500); 25 Sep 2012 01:06:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23533 invoked by uid 99); 25 Sep 2012 01:06:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 01:06:50 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a81.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 01:06:42 +0000 Received: from homiemail-a81.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a81.g.dreamhost.com (Postfix) with ESMTP id 9C090A8064 for ; Mon, 24 Sep 2012 18:06:21 -0700 (PDT) Received: from [192.168.2.77] (unknown [116.90.132.105]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a81.g.dreamhost.com (Postfix) with ESMTPSA id 1F175A8061 for ; Mon, 24 Sep 2012 18:06:20 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_8B0B18DE-5CD4-4171-A5C1-09979689C077" Message-Id: <0DFF4995-9DF6-44F1-9242-B663F67BE3E5@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Subject: Re: Secondary index loss on node restart Date: Tue, 25 Sep 2012 13:06:19 +1200 References: <5463B3F0-D46D-45BD-9A43-92E790E549C8@yahoo.com> To: user@cassandra.apache.org In-Reply-To: <5463B3F0-D46D-45BD-9A43-92E790E549C8@yahoo.com> X-Mailer: Apple Mail (2.1498) --Apple-Mail=_8B0B18DE-5CD4-4171-A5C1-09979689C077 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Can you contribute your experience to this ticket = https://issues.apache.org/jira/browse/CASSANDRA-4670 ?=20 Thanks ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/09/2012, at 6:22 AM, Michael Theroux wrote: > Hello, >=20 > We have been noticing an issue where, about 50% of the time in which a = node fails or is restarted, secondary indexes appear to be partially = lost or corrupted. A drop and re-add of the index appears to correct = the issue. There are no errors in the cassandra logs that I see. Part = of the index seems to be simply missing. Sometimes this corruption/loss = doesn't happen immediately, but sometime after the node is restarted. = In addition, the index never appears to have an issue when the node = comes down, it is only after the node comes back up and recovers in = which we experience an issue. >=20 > We developed some code that goes through all the rows in the table, by = key, in which the index is present. It then attempts to look up the = information via secondary index, in an attempt to detect when the issue = occurs. Another odd observation is that the number of members present = in the index when we have the issue varies up and down (the index and = the tables don't change that often). >=20 > We are running a 6 node Cassandra cluster with a replication factor of = 3, consistency level for all queries is LOCAL_QUORUM. We are running = Cassandra 1.1.2. >=20 > Anyone have any insights? >=20 > -Mike --Apple-Mail=_8B0B18DE-5CD4-4171-A5C1-09979689C077 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii Can = you contribute your experience to this ticket https://issu= es.apache.org/jira/browse/CASSANDRA-4670 ? 

Thanks


http://www.thelastpickle.com

On 24/09/2012, at 6:22 AM, Michael Theroux <mtheroux2@yahoo.com> = wrote:

Hello,

We have been noticing an issue where, about = 50% of the time in which a node fails or is restarted, secondary indexes = appear to be partially lost or corrupted.  A drop and re-add of the = index appears to correct the issue.  There are no errors in the = cassandra logs that I see.  Part of the index seems to be simply = missing.  Sometimes this corruption/loss doesn't happen = immediately, but sometime after the node is restarted.  In = addition, the index never appears to have an issue when the node comes = down, it is only after the node comes back up and recovers in which we = experience an issue.

We developed some code that goes through all = the rows in the table, by key, in which the index is present.  It = then attempts to look up the information via secondary index, in an = attempt to detect when the issue occurs.  Another odd observation = is that the number of members present in the index when we have the = issue varies up and down (the index and the tables don't change that = often).

We are running a 6 node Cassandra cluster with a = replication factor of 3, consistency level for all queries is = LOCAL_QUORUM.  We are running Cassandra 1.1.2.

Anyone have = any = insights?

-Mike

= --Apple-Mail=_8B0B18DE-5CD4-4171-A5C1-09979689C077--