Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 17832611C for ; Thu, 21 Jul 2011 22:51:26 +0000 (UTC) Received: (qmail 41653 invoked by uid 500); 21 Jul 2011 22:51:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 41554 invoked by uid 500); 21 Jul 2011 22:51:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 41546 invoked by uid 99); 21 Jul 2011 22:51:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 22:51:23 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cassandralabs@gmail.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 22:51:19 +0000 Received: by iye7 with SMTP id 7so1491510iye.31 for ; Thu, 21 Jul 2011 15:50:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=tCrpiJox0KD2x2kyuBzwAAObvZrICUXyCo3r91iTlIc=; b=IIU6a8otDdXAbBBprb/k16Vag3LYh5mSGdyLaKRJ7P6aezju5XJHMyILvOPu8OgqVd XVGj2UTA050pThHwJ11ydHW6hrpkNRtovZYXD5ZR2e1enryvbM8alErfHoOVqeL5zeme gl0DMHrYO52VKd8UlemhEE45XsquoE5B5biUM= MIME-Version: 1.0 Received: by 10.231.95.143 with SMTP id d15mr690339ibn.149.1311288658354; Thu, 21 Jul 2011 15:50:58 -0700 (PDT) Received: by 10.231.3.83 with HTTP; Thu, 21 Jul 2011 15:50:58 -0700 (PDT) In-Reply-To: <1F68BAA7-8E23-4218-B4E3-86DC16B23B8A@thelastpickle.com> References: <1F68BAA7-8E23-4218-B4E3-86DC16B23B8A@thelastpickle.com> Date: Thu, 21 Jul 2011 15:50:58 -0700 Message-ID: Subject: Re: Repair fails with java.io.IOError: java.io.EOFException From: Sameer Farooqui To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=bcaec5430a08ade2ff04a89c2ed0 --bcaec5430a08ade2ff04a89c2ed0 Content-Type: text/plain; charset=ISO-8859-1 Hey Aaron, I don't have any output.log files in that folder: ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls system.log system.log.11 system.log.4 system.log.7 system.log.1 system.log.2 system.log.5 system.log.8 system.log.10 system.log.3 system.log.6 system.log.9 On Thu, Jul 21, 2011 at 3:40 PM, aaron morton wrote: > Check /var/log/cassandra/output.log (assuming the default init scripts) > > A > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 22 Jul 2011, at 10:13, Sameer Farooqui wrote: > > Hmm. Just looked at the log more closely. > > So, what actually happened is while Repair was running on this specific > node, the Cassandra java process terminated itself automatically. The last > entries in the log are: > > INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line > 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 used; max > is 4030726144 > INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line > 128) GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 used; max > is 4030726144 > INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line > 128) GC for ParNew: 251 ms, 148861328 reclaimed leaving 1931111120 used; max > is 4030726144 > INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java (line > 128) GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368 used; max > is 4030726144 > INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line > 128) GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 used; max > is 4030726144 > INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line > 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max > is 4030726144 > > When we came in this morning, nodetool ring from another node showed the > 1st node as down and OpsCenter also reported it as down. > > Next we ran "sudo netstat -anp | grep 7199" from the 1st node to see the > status of the Cassandra PID and it was not running. > > We then started Cassandra: > > INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) > Logging initialized > INFO [main] 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java (line 96) > Heap size: 3894411264/3894411264 > INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA mlockall > successful > INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.java (line 121) > Loading settings from > file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml > > > It was during this start process that the java.io.EOFException was seen, > but yes, like you said Jonathan, the Cassandra process started back up and > joined the ring. > > We're now wondering why the Repair failed and why Cassandra crashed in the > first place. We only had default level logging enabled. Is there something > else I can check or that you suspect? > > Should we turn the logging up to debug and retry the Repair? > > > - Sameer > > > On Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis wrote: > >> Looks harmless to me. >> >> On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui >> wrote: >> > While running Repair on a 0.8.1 node, we got this error in the >> system.log: >> > >> > ERROR [Thread-23] 2011-07-21 15:48:43,868 AbstractCassandraDaemon.java >> (line >> > 113) Fatal exception in thread Thread[Thread-23,5,main] >> > java.io.IOError: java.io.EOFException >> > at >> > >> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78) >> > Caused by: java.io.EOFException >> > at java.io.DataInputStream.readInt(DataInputStream.java:375) >> > at >> > >> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) >> > >> > There's just a bunch of informational messages about Gossip before this. >> > >> > Looks like the file or stream unexpectedly ended? >> > >> http://download.oracle.com/javase/1.4.2/docs/api/java/io/EOFException.html >> > >> > Is this a bug or something wrong in our environment? >> > >> > >> > - Sameer >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > > --bcaec5430a08ade2ff04a89c2ed0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hey Aaron,

I don't have any output.log files in that folder:
=
ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra
ubuntu@ip-10-2-x-x:/var/= log/cassandra$ ls
system.log=A0=A0=A0=A0 system.log.11=A0 system.log.4= =A0 system.log.7
system.log.1=A0=A0 system.log.2=A0=A0 system.log.5=A0 system.log.8
syste= m.log.10=A0 system.log.3=A0=A0 system.log.6=A0 system.log.9



=
On Thu, Jul 21, 2011 at 3:40 PM, aaron morton <aaron@thelas= tpickle.com> wrote:
Check /= var/log/cassandra/output.log (assuming the default init scripts)

A
-----------------
Aaron Morton
Freelance Cass= andra Developer
@aaronmorton

On 22 Jul 2011, at 10:13, Sameer Farooqui wrote:

Hmm. Just looked at the log more closely.

So,= what actually happened is while Repair was running on this specific node, = the Cassandra java process terminated itself automatically. The last entrie= s in the log are:

=A0INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (li= ne 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 used;= max is 4030726144
=A0INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GC= Inspector.java (line 128) GC for ParNew: 266 ms, 158835624 reclaimed leavin= g 1864471688 used; max is 4030726144
=A0INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line 1= 28) GC for ParNew: 251 ms, 148861328 reclaimed leaving 1931111120 used; max= is 4030726144
=A0INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInsp= ector.java (line 128) GC for ParNew: 260 ms, 157638152 reclaimed leaving 19= 55746368 used; max is 4030726144
=A0INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line 1= 28) GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 used; max= is 4030726144
=A0INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInsp= ector.java (line 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 20= 40879600 used; max is 4030726144
=A0
When we came in this morning, nodetool ring from another node showed= the 1st node as down and OpsCenter also reported it as down.

Next w= e ran "sudo netstat -anp | grep 7199" from the 1st node to see th= e status of the Cassandra PID and it was not running.

We then started Cassandra:

INFO [main] 2011-07-21 15:48:07,233 A= bstractCassandraDaemon.java (line 78) Logging initialized
=A0INFO [main]= 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java (line 96) Heap size: = 3894411264/3894411264
=A0INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA mlockal= l successful
=A0INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.j= ava (line 121) Loading settings from file:/home/ubuntu/brisk/resources/cass= andra/conf/cassandra.yaml


It was during this start process that the java.io.EOFException was = seen, but yes, like you said Jonathan, the Cassandra process started back u= p and joined the ring.

We're now wondering why the Repair faile= d and why Cassandra crashed in the first place. We only had default level l= ogging enabled. Is there something else I can check or that you suspect?
Should we turn the logging up to debug and retry the Repair?

- Sameer


On Thu, Jul 21, 2011 at 12:= 37 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
Looks harmless to me.

On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui
<cassandral= abs@gmail.com> wrote:
> While running Repair on a 0.8.1 node, we got this error in the system.= log:
>
> ERROR [Thread-23] 2011-07-21 15:48:43,868 AbstractCassandraDaemon.java= (line
> 113) Fatal exception in thread Thread[Thread-23,5,main]
> java.io.IOError: java.io.EOFException
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnecti= on.java:78)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnecti= on.java:66)
>
> There's just a bunch of informational messages about Gossip before= this.
>
> Looks like the file or stream unexpectedly ended?
> http://download.oracle.com/javase/1.4.2/= docs/api/java/io/EOFException.html
>
> Is this a bug or something wrong in our environment?
>
>
> - Sameer
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.= com



--bcaec5430a08ade2ff04a89c2ed0--