Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A3CC69F7 for ; Thu, 21 Jul 2011 23:31:26 +0000 (UTC) Received: (qmail 72298 invoked by uid 500); 21 Jul 2011 23:31:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 72125 invoked by uid 500); 21 Jul 2011 23:31:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 72117 invoked by uid 99); 21 Jul 2011 23:31:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 23:31:23 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [66.33.216.122] (HELO hapkido.dreamhost.com) (66.33.216.122) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 23:31:17 +0000 Received: from homiemail-a55.g.dreamhost.com (caibbdcaaaaf.dreamhost.com [208.113.200.5]) by hapkido.dreamhost.com (Postfix) with ESMTP id E740B17EA8E for ; Thu, 21 Jul 2011 16:30:56 -0700 (PDT) Received: from homiemail-a55.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTP id 1E2DC12C0A2 for ; Thu, 21 Jul 2011 16:30:50 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=Esc2e56vkW WkUxOLcUzVh8PQgK81Dr0NiObPDHvy+3iVQHJwDGZDRz+NEQjnHCRCuFjmflFyeU q50gC2BEIRUJYqIMO0prrYtg6w02+80oIacFr6MFw8liw9Es27GcPyVJDn4QiTZ+ zW4n8421hoVS/sPLD++2fvEY/ykfZBgOo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=G9Ad7CXLFbgvdITo W5JbF84fNIQ=; b=ikslVvV2B577j+XKTnZKEDk14295GuVxijGG2W7oos9hIgQW Otrr4Seu5OWLiabGKH3wCzwmzkOcc6F9NmoEWXHGX0IRse1T4roHj9dDIRHVq351 sUTMSZXsFGuUAWMeqJQ2I67sgA5Rd6d0nEgQMc1GY7zy0jOzCT35GEb0N7k= Received: from [10.0.1.152] (121-73-157-230.cable.telstraclear.net [121.73.157.230]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTPSA id 7A30712C064 for ; Thu, 21 Jul 2011 16:30:49 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: multipart/alternative; boundary="Apple-Mail=_B8015F58-6ED4-4F5A-A44E-DAC7E6AD85D7" Subject: Re: Repair fails with java.io.IOError: java.io.EOFException Date: Fri, 22 Jul 2011 11:30:47 +1200 In-Reply-To: To: user@cassandra.apache.org References: <1F68BAA7-8E23-4218-B4E3-86DC16B23B8A@thelastpickle.com> Message-Id: X-Mailer: Apple Mail (2.1244.3) --Apple-Mail=_B8015F58-6ED4-4F5A-A44E-DAC7E6AD85D7 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 The default init.d script will direct std out/err to that file, how are = you starting brisk / cassandra ? Check the syslog and other logs in /var/log to see if the OS killed = cassandra.=20 Also, what was the last thing in the casandra log before INFO [main] = 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging = initialised ? Cheers =20 ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 10:50, Sameer Farooqui wrote: > Hey Aaron, >=20 > I don't have any output.log files in that folder: >=20 > ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra > ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls > system.log system.log.11 system.log.4 system.log.7 > system.log.1 system.log.2 system.log.5 system.log.8 > system.log.10 system.log.3 system.log.6 system.log.9 >=20 >=20 >=20 > On Thu, Jul 21, 2011 at 3:40 PM, aaron morton = wrote: > Check /var/log/cassandra/output.log (assuming the default init = scripts) >=20 > A > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 22 Jul 2011, at 10:13, Sameer Farooqui wrote: >=20 >> Hmm. Just looked at the log more closely. >>=20 >> So, what actually happened is while Repair was running on this = specific node, the Cassandra java process terminated itself = automatically. The last entries in the log are: >>=20 >> INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java = (line 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 = used; max is 4030726144 >> INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java = (line 128) GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 = used; max is 4030726144 >> INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java = (line 128) GC for ParNew: 251 ms, 148861328 reclaimed leaving 1931111120 = used; max is 4030726144 >> INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java = (line 128) GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368 = used; max is 4030726144 >> INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java = (line 128) GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 = used; max is 4030726144 >> INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java = (line 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 = used; max is 4030726144 >> =20 >> When we came in this morning, nodetool ring from another node showed = the 1st node as down and OpsCenter also reported it as down. >>=20 >> Next we ran "sudo netstat -anp | grep 7199" from the 1st node to see = the status of the Cassandra PID and it was not running. >>=20 >> We then started Cassandra: >>=20 >> INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java = (line 78) Logging initialized >> INFO [main] 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java = (line 96) Heap size: 3894411264/3894411264 >> INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA = mlockall successful >> INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.java (line = 121) Loading settings from = file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml >>=20 >>=20 >> It was during this start process that the java.io.EOFException was = seen, but yes, like you said Jonathan, the Cassandra process started = back up and joined the ring.=20 >>=20 >> We're now wondering why the Repair failed and why Cassandra crashed = in the first place. We only had default level logging enabled. Is there = something else I can check or that you suspect? >>=20 >> Should we turn the logging up to debug and retry the Repair? >>=20 >>=20 >> - Sameer >>=20 >>=20 >> On Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis = wrote: >> Looks harmless to me. >>=20 >> On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui >> wrote: >> > While running Repair on a 0.8.1 node, we got this error in the = system.log: >> > >> > ERROR [Thread-23] 2011-07-21 15:48:43,868 = AbstractCassandraDaemon.java (line >> > 113) Fatal exception in thread Thread[Thread-23,5,main] >> > java.io.IOError: java.io.EOFException >> > at >> > = org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.j= ava:78) >> > Caused by: java.io.EOFException >> > at java.io.DataInputStream.readInt(DataInputStream.java:375) >> > at >> > = org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.j= ava:66) >> > >> > There's just a bunch of informational messages about Gossip before = this. >> > >> > Looks like the file or stream unexpectedly ended? >> > = http://download.oracle.com/javase/1.4.2/docs/api/java/io/EOFException.html= >> > >> > Is this a bug or something wrong in our environment? >> > >> > >> > - Sameer >> > >>=20 >>=20 >>=20 >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >>=20 >=20 >=20 --Apple-Mail=_B8015F58-6ED4-4F5A-A44E-DAC7E6AD85D7 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 The = default init.d script will direct std out/err to that file, how are you = starting brisk / cassandra ?

Check the syslog and = other logs in /var/log to see if the OS killed = cassandra. 

Also, what was the last thing = in the casandra log before INFO [main] 2011-07-21 15:48:07,233 = AbstractCassandraDaemon.java (line 78) Logging initialised = ?


Cheers
 
=
http://www.thelastpickle.com

On 22 Jul 2011, at 10:50, Sameer Farooqui wrote:

Hey = Aaron,

I don't have any output.log files in that = folder:

ubuntu@ip-10-2-x-x:~$ cd = /var/log/cassandra
ubuntu@ip-10-2-x-x:/var/log/cassandra$ = ls
system.log     system.log.11  = system.log.4  system.log.7
system.log.1   system.log.2   system.log.5  = system.log.8
system.log.10  system.log.3   = system.log.6  system.log.9



On Thu, Jul 21, 2011 at 3:40 PM, aaron morton = <aaron@thelastpickle.com> wrote:
Check = /var/log/cassandra/output.log (assuming the default init = scripts)

A
-----------------
Aaron Morton
Freelance = Cassandra Developer
@aaronmorton

On 22 Jul 2011, at 10:13, Sameer Farooqui = wrote:

Hmm. Just looked at the log = more closely.

So, what actually happened is while Repair was = running on this specific node, the Cassandra java process terminated = itself automatically. The last entries in the log are:

 INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 = GCInspector.java (line 128) GC for ParNew: 214 ms, 162748656 reclaimed = leaving 1845274888 used; max is 4030726144
 INFO = [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line 128) = GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 used; max = is 4030726144
 INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java = (line 128) GC for ParNew: 251 ms, 148861328 reclaimed leaving 1931111120 = used; max is 4030726144
 INFO [ScheduledTasks:1] 2011-07-21 = 13:01:19,358 GCInspector.java (line 128) GC for ParNew: 260 ms, = 157638152 reclaimed leaving 1955746368 used; max is 4030726144
 INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java = (line 128) GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 = used; max is 4030726144
 INFO [ScheduledTasks:1] 2011-07-21 = 13:01:51,187 GCInspector.java (line 128) GC for ParNew: 202 ms, = 153219160 reclaimed leaving 2040879600 used; max is 4030726144
 
When we came in this morning, nodetool ring from another node = showed the 1st node as down and OpsCenter also reported it as = down.

Next we ran "sudo netstat -anp | grep 7199" from the 1st = node to see the status of the Cassandra PID and it was not running.

We then started Cassandra:

INFO [main] 2011-07-21 = 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging = initialized
 INFO [main] 2011-07-21 15:48:07,266 = AbstractCassandraDaemon.java (line 96) Heap size: = 3894411264/3894411264
 INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA = mlockall successful
 INFO [main] 2011-07-21 15:48:11,702 = DatabaseDescriptor.java (line 121) Loading settings from = file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml


It was during this start process that the java.io.EOFException = was seen, but yes, like you said Jonathan, the Cassandra process started = back up and joined the ring.

We're now wondering why the Repair = failed and why Cassandra crashed in the first place. We only had default = level logging enabled. Is there something else I can check or that you = suspect?

Should we turn the logging up to debug and retry the = Repair?


- Sameer


On = Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
Looks harmless to me.

On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui
<cassandralabs@gmail.com> wrote:
> While running Repair on a 0.8.1 node, we got this error in the = system.log:
>
> ERROR [Thread-23] 2011-07-21 15:48:43,868 = AbstractCassandraDaemon.java (line
> 113) Fatal exception in thread Thread[Thread-23,5,main]
> java.io.IOError: java.io.EOFException
> at
> = org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.j= ava:78)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at
> = org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.j= ava:66)
>
> There's just a bunch of informational messages about Gossip before = this.
>
> Looks like the file or stream unexpectedly ended?
> http://download.oracle.com/javase/1.4.2/docs/api/java/io= /EOFException.html
>
> Is this a bug or something wrong in our environment?
>
>
> - Sameer
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra = support
http://www.datastax.com




= --Apple-Mail=_B8015F58-6ED4-4F5A-A44E-DAC7E6AD85D7--