Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 748BE9D2E for ; Fri, 11 May 2012 09:19:22 +0000 (UTC) Received: (qmail 72432 invoked by uid 500); 11 May 2012 09:19:20 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 72193 invoked by uid 500); 11 May 2012 09:19:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 72179 invoked by uid 99); 11 May 2012 09:19:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 May 2012 09:19:14 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,TO_NO_BRKTS_PCNT X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.217.172] (HELO mail-lb0-f172.google.com) (209.85.217.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 May 2012 09:19:07 +0000 Received: by lbbgo11 with SMTP id go11so1987619lbb.31 for ; Fri, 11 May 2012 02:18:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:mime-version:content-type:subject:date:in-reply-to:to :references:message-id:x-mailer:x-gm-message-state; bh=P5yJajF8OGEmMm+iWcWscXchMtOApV6bnlB+aj3AD6o=; b=Kf4CyIdPFY73L3vKb1kczd2mwNhumSWVaEDJGhVKxbYdiSCdzY8JBRB34zNTvFJa6m GpYZnrVmblzZ41AFA4KELRiuTf4jU2Nnuagvf/VaGN3kAe6SSOiqyvg36cCGVmh2dBEA LT2fEZIba3fpaY4gaEsi0U3fTg/j76VhlKy38ba6oZXKiA2N03g59OSYouDl9w7JFGPp W8lNs5OVNrlB+yg9Fdkjf6Q8JkVDR3pva+5MH+8VxIKyIqgu+MBm83pYcTXpGV2sLZrA +j9sSrGxTasZ3umiT2MS0eV8QtnP3rRDaAetprJuKKb96sYnE2qk3InxoPeRDcK7ZRTn PFuw== Received: by 10.112.28.230 with SMTP id e6mr3487911lbh.34.1336727926559; Fri, 11 May 2012 02:18:46 -0700 (PDT) Received: from [10.10.1.63] (firewall.aspiro.com. [80.65.54.242]) by mx.google.com with ESMTPS id hv8sm8832903lab.1.2012.05.11.02.18.44 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 11 May 2012 02:18:45 -0700 (PDT) From: Jeff Williams Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_55600BCB-D0B6-4319-9CA2-D01024868C5C" Subject: Re: Keyspace lost after restart Date: Fri, 11 May 2012 11:18:46 +0200 In-Reply-To: To: user@cassandra.apache.org References: <855762D8-204E-4DD2-B4BB-F056FEB72E10@thelastpickle.com> Message-Id: X-Mailer: Apple Mail (2.1257) X-Gm-Message-State: ALoCoQm4QxnJKcKcdrrH+Q7yTVeKJU7603hHMXJ/5YmHDVH7yvYr6ASTu0m81aCFdNf3p/uFqwDr --Apple-Mail=_55600BCB-D0B6-4319-9CA2-D01024868C5C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Conan, Good to see I'm not alone in this! I just set up a fresh test cluster. I = first did a fresh install of 1.1.0 and was able to replicate the issue. = I then did a fresh install using 1.0.10 and didn't see the issue. So it = looks like rolling back to 1.0.10 could be the answer for now. Jeff On May 11, 2012, at 10:40 AM, Conan Cook wrote: > Hi, >=20 > OK we're pretty sure we dropped and re-created the keyspace before = restarting the Cassandra nodes during some testing (we've been migrating = to a new cluster). The keyspace was created via the cli: >=20 >=20 > create keyspace m7 >=20 > with placement_strategy =3D 'NetworkTopologyStrategy' >=20 > and strategy_options =3D {us-east: 3} >=20 > and durable_writes =3D true; >=20 > I'm pretty confident that it's a result of the issue I spotted before: >=20 > https://issues.apache.org/jira/browse/CASSANDRA-4219=20 >=20 > Does anyone know whether this also affected versions before 1.1.0? If = not then we can just roll back until there's a fix; we're not using our = cluster in production so we can afford to just bin it all and load it = again. +1 for this being a major issue though, the fact that you can't = see it until you restart a node makes it quite dangerous, and that node = is lost when it occurs (I also haven't been able to restore the schema = in any way). >=20 > Thanks very much, >=20 >=20 > Conan >=20 >=20 >=20 > On 10 May 2012 17:15, Conan Cook wrote: > Hi Aaron, >=20 > Thanks for getting back to me! Yes, I believe our keyspace was = created prior to 1.1, and I think I also understand why you're asking = that, having found this: >=20 > https://issues.apache.org/jira/browse/CASSANDRA-4219=20 >=20 > Here's our startup log: >=20 > https://gist.github.com/2654155 >=20 > There isn't much in there of interest however. It may well be the = case that we created our keyspace, dropped it, then created it again. = The dev responsible for setting it up is ill today, but I'll get back to = you tomorrow with exact details of how it was originally created and = whether we did definitely drop and re-create it. >=20 > Ta, >=20 > Conan >=20 >=20 > On 10 May 2012 11:43, aaron morton wrote: > Was this a schema that was created prior to 1.1 ? >=20 > What process are you using to create the schema ?=20 >=20 > Can you share the logs from system startup ? Up until it logs = "Listening for thrift clients". (if they are long please link to them) >=20 > Cheers >=20 > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 10/05/2012, at 1:04 AM, Conan Cook wrote: >=20 >> Sorry, forgot to mention we're running Cassandra 1.1. >>=20 >> Conan >>=20 >> On 8 May 2012 17:51, Conan Cook wrote: >> Hi Cassandra Folk, >>=20 >> We've experienced a problem a couple of times where Cassandra nodes = lose a keyspace after a restart. We've restarted 2 out of 3 nodes, and = they have both experienced this problem; clearly we're doing something = wrong, but don't know what. The data files are all still there, as = before, but the node can't see the keyspace (we only have one). Tthe = nodetool still says that each one is responsible for 33% of the keys, = but the disk usage has dropped to a tiny amount on the nodes that we've = restarted. I saw this: >>=20 >> = http://mail-archives.apache.org/mod_mbox/cassandra-user/201202.mbox/%3C4F3= 582E7.20907@conga.com%3E >>=20 >> Seems to be exactly our problem, but we have not modified the = cassandra.yaml - we have overwritten it through an automated process, = and that happened just before restarting, but the contents did not = change. >>=20 >> Any ideas as to what might cause this, or how the keyspace can be = restored (like I say, the data is all still in the data directory). >>=20 >> We're running in AWS. >>=20 >> Thanks, >>=20 >>=20 >> Conan >>=20 >=20 >=20 >=20 --Apple-Mail=_55600BCB-D0B6-4319-9CA2-D01024868C5C Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=iso-8859-1 Conan,

Good to see I'm not alone in this! I just set up a fresh test cluster. I first did a fresh install of 1.1.0 and was able to replicate the issue. I then did a fresh install using 1.0.10 and didn't see the issue. So it looks like rolling back to 1.0.10 could be the answer for now.

Jeff

On May 11, 2012, at 10:40 AM, Conan Cook wrote:

Hi,

OK we're pretty sure we dropped and re-created the keyspace before restarting the Cassandra nodes during some testing (we've been migrating to a new cluster).  The keyspace was created via the cli:

create keyspace m7
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {us-east: 3}
  and durable_writes = true;

I'm pretty confident that it's a result of the issue I spotted before:


Does anyone know whether this also affected versions before 1.1.0?  If not then we can just roll back until there's a fix; we're not using our cluster in production so we can afford to just bin it all and load it again.  +1 for this being a major issue though, the fact that you can't see it until you restart a node makes it quite dangerous, and that node is lost when it occurs (I also haven't been able to restore the schema in any way).

Thanks very much,


Conan



On 10 May 2012 17:15, Conan Cook <conan.cook@amee.com> wrote:
Hi Aaron,

Thanks for getting back to me!  Yes, I believe our keyspace was created prior to 1.1, and I think I also understand why you're asking that, having found this:


Here's our startup log:


There isn't much in there of interest however.  It may well be the case that we created our keyspace, dropped it, then created it again.  The dev responsible for setting it up is ill today, but I'll get back to you tomorrow with exact details of how it was originally created and whether we did definitely drop and re-create it.

Ta,

Conan


On 10 May 2012 11:43, aaron morton <aaron@thelastpickle.com> wrote:
Was this a schema that was created prior to 1.1 ?

What process are you using to create the schema ? 

Can you share the logs from system startup ? Up until it logs "Listening for thrift clients". (if they are long please link to them)

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 10/05/2012, at 1:04 AM, Conan Cook wrote:

Sorry, forgot to mention we're running Cassandra 1.1.

Conan

On 8 May 2012 17:51, Conan Cook <conan.cook@amee.com> wrote:
Hi Cassandra Folk,

We've experienced a problem a couple of times where Cassandra nodes lose a keyspace after a restart.  We've restarted 2 out of 3 nodes, and they have both experienced this problem; clearly we're doing something wrong, but don't know what.  The data files are all still there, as before, but the node can't see the keyspace (we only have one).  Tthe nodetool still says that each one is responsible for 33% of the keys, but the disk usage has dropped to a tiny amount on the nodes that we've restarted.  I saw this:


Seems to be exactly our problem, but we have not modified the cassandra.yaml - we have overwritten it through an automated process, and that happened just before restarting, but the contents did not change.

Any ideas as to what might cause this, or how the keyspace can be restored (like I say, the data is all still in the data directory).

We're running in AWS.

Thanks,


Conan





--Apple-Mail=_55600BCB-D0B6-4319-9CA2-D01024868C5C--