Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 99E66447F for ; Thu, 23 Jun 2011 16:33:30 +0000 (UTC) Received: (qmail 73858 invoked by uid 500); 23 Jun 2011 16:33:30 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 73829 invoked by uid 500); 23 Jun 2011 16:33:30 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 73821 invoked by uid 99); 23 Jun 2011 16:33:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2011 16:33:30 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.212.42 as permitted sender) Received: from [209.85.212.42] (HELO mail-vw0-f42.google.com) (209.85.212.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2011 16:33:25 +0000 Received: by vwl1 with SMTP id 1so2019997vwl.15 for ; Thu, 23 Jun 2011 09:33:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=cy/cvBA0TZRJhijpouxmvoWXrTKKd5eaALHuQClio5M=; b=XMHGEMQ2NUxJd5qNiqqfk190oTYM9tlDCGei38nL02oDysDHpxp7czgyxMvbTt/CMa ccaTKR9FAixk4xQjL9Kq0pUH/iQzzT4e2hjzQK8tErxQQQoDVnI3Oepxbydwx00FKdT6 iK2CHRRy+5TP0bkJWAGlMLzrjBPa58tAgbXPE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=I1EzLXdnejCsMH1uqUwjw0DtYbi8pYR7T1CZ/iJ+qDKCwJ+/y+kqFp2vQwMrY/XhpM PGlsEJblb6VqDvcAmEnpmU0sVQ3b4FJA7EOnkV504/rWY8mp4RBx3uj+rb3Nsikf3l/D FPUsrTF4znqzNf2DJ6kZkVpJoGcra43eb//Aw= Received: by 10.52.95.46 with SMTP id dh14mr3029176vdb.60.1308846784045; Thu, 23 Jun 2011 09:33:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.188.137 with HTTP; Thu, 23 Jun 2011 09:32:44 -0700 (PDT) In-Reply-To: <7DA4B654C7FEB24C9998EE4867BEE93EBC74C9@CNMAIL12.cn.utstarcom.com> References: <7DA4B654C7FEB24C9998EE4867BEE93EB9EC06@CNMAIL12.cn.utstarcom.com> <7DA4B654C7FEB24C9998EE4867BEE93EBC74C9@CNMAIL12.cn.utstarcom.com> From: Ted Dunning Date: Thu, 23 Jun 2011 09:32:44 -0700 Message-ID: Subject: Re: disk full To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=20cf3071cc68a0e17a04a663a334 --20cf3071cc68a0e17a04a663a334 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wed, Jun 22, 2011 at 7:15 PM, Donna Li wrote: > After I make space, the zookeeper server also can not start > normally. Following log is printed. It shows "Last transaction was partia= l". > I must clear the data (/data/version-2), then the server can restart > normally. > If clearing the data is acceptable, then it sounds like you have an answer. Can somebody else comment on this error? Will the server refuse to start here? Donna, is this the only server in the cluster that had disk full? Or did all of them have disk full? If only one had the disk full, then wiping the data or even just the last log file should be fine since the current data will be recovered from other members of the cluster. > Under the disk full situation, if the cluster is unavailable, the TCP > connection between the server and the client would break. So when the > cluster comes back normally, the client can not change to connected statu= s. > I want to know if this is the limitation that zookeeper can not meet > disk full. > I don't understand the question. Are you talking about one server out of several failing? If so, then having that server come back should cause no problems at all. Also, your sentence that starts "I want to know ..." is something I don't quite understand. Can you say it in different words? > -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- > =E5=8F=91=E4=BB=B6=E4=BA=BA: Ted Dunning [mailto:ted.dunning@gmail.com] > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2011=E5=B9=B46=E6=9C=8811=E6=97=A5 = 14:25 > =E6=94=B6=E4=BB=B6=E4=BA=BA: user@zookeeper.apache.org > =E4=B8=BB=E9=A2=98: Re: disk full > > Zookeeper provides guarantees of durability and reliability. When the > disk is full, transactions cannot be persisted to disk and thus these > guarantees cannot be made. > > If you restart ZK after making space, then it should continue > reliably. One way to do this is to get rid of old transaction logs > that are no longer needed. Another is to plan your needs more > precisely so conflicts on space do not occur. Sometimes this is quite > difficult which is one reason that some people opt for dedicated > machines for critical services like this. > > On Sat, Jun 11, 2011 at 7:04 AM, Donna Li wrote: > > Hi,all: > > > > > > > > What is the impact of disk full to zookeeper? Why restart zookeeper can > > not resolve the problem? I must clear the data of the zookeeper and > > restart. The print error is: > > > > > > > > > > > > 2011-06-09 11:44:29,204 - INFO [main:QuorumPeerConfig@90] - Reading > > configuration from: > > /usr/local/rss/zookeeper/tool/zookeeper-3.3.2/bin/../conf/zoo.cfg > > > > 2011-06-09 11:44:29,211 - WARN [main:QuorumPeerConfig@266] - > > Non-optimial configuration, consider an odd number of servers. > > > > 2011-06-09 11:44:29,211 - INFO [main:QuorumPeerConfig@310] - Defaultin= g > > to majority quorums > > > > 2011-06-09 11:44:29,223 - INFO [main:QuorumPeerMain@119] - Starting > > quorum peer > > > > 2011-06-09 11:44:29,239 - INFO [main:NIOServerCnxn$Factory@143] - > > binding to port 0.0.0.0/0.0.0.0:2181 > > > > 2011-06-09 11:44:29,251 - INFO [main:QuorumPeer@818] - tickTime set to > > 2000 > > > > 2011-06-09 11:44:29,252 - INFO [main:QuorumPeer@829] - > > minSessionTimeout set to -1 > > > > 2011-06-09 11:44:29,252 - INFO [main:QuorumPeer@840] - > > maxSessionTimeout set to -1 > > > > 2011-06-09 11:44:29,252 - INFO [main:QuorumPeer@855] - initLimit set t= o > > 5 > > > > 2011-06-09 11:44:29,266 - INFO [main:FileSnap@82] - Reading snapshot > > /usr/local/rss/zookeeper/data/version-2/snapshot.7000011ff > > > > 2011-06-09 11:44:29,301 - ERROR [main:Util@238] - Last transaction was > > partial. > > > > 2011-06-09 11:44:29,302 - FATAL [main:QuorumPeer@399] - Unable to load > > database on disk > > > > java.io.EOFException > > > > at java.io.DataInputStream.readInt(DataInputStream.java:375) > > > > at > > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > > > > at > > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHead= e > > r.java:65) > > > > at > > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inSt= r > > eamCreated(FileTxnLog.java:508) > > > > at > > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.crea= t > > eInputArchive(FileTxnLog.java:527) > > > > at > > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goTo= N > > extLog(FileTxnLog.java:493) > > > > at > > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next= ( > > FileTxnLog.java:575) > > > > at > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnS= n > > apLog.java:145) > > > > at > > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:197= ) > > > > at > > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:397= ) > > > > at > > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumP= e > > erMain.java:143) > > > > at > > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(Quor= u > > mPeerMain.java:103) > > > > at > > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.j= a > > va:76) > > > > 2011-06-09 11:44:29,303 - FATAL [main:QuorumPeerMain@87] - Unexpected > > exception, exiting abnormally > > > > java.lang.RuntimeException: Unable to run quorum server > > > > at > > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:400= ) > > > > at > > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumP= e > > erMain.java:143) > > > > at > > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(Quor= u > > mPeerMain.java:103) > > > > at > > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.j= a > > va:76) > > > > Caused by: java.io.EOFException > > > > at java.io.DataInputStream.readInt(DataInputStream.java:375) > > > > at > > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > > > > at > > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHead= e > > r.java:65) > > > > at > > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inSt= r > > eamCreated(FileTxnLog.java:508) > > > > at > > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.crea= t > > eInputArchive(FileTxnLog.java:527) > > > > at > > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goTo= N > > extLog(FileTxnLog.java:493) > > > > at > > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next= ( > > FileTxnLog.java:575) > > > > at > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnS= n > > apLog.java:145) > > > > at > > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:197= ) > > > > at > > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:397= ) > > > > ... 3 more > > > > > --20cf3071cc68a0e17a04a663a334--