zookeeper-bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Nagro <jna...@hubspot.com>
Subject BK servers in a funky state
Date Thu, 05 Apr 2012 13:19:34 GMT
Hello -

I've been hitting Ivan up for advice about a bookkeeper project of mine. I
recently ran into another issue and he suggested I inquire here since he is
traveling.

We've got a pool of 5 BK servers running in EC2. Last night they got into a
funky state and/or crashed - unfortunately the log with the original event
got rotated (that has been fixed). I was running a cut of 4.1.0-SNAPSHOT
sha 6d56d60831a63fe9520ce156686d0cb1142e44f5 from Wed Mar 28 21:57:40 2012
+0000 which brought everything up to BOOKKEEPER-195. That build had some
bugfixes over 4.0.0 that I was originally running (and a previous version
before that).

When I restart the servers after the incident this is what the logs looked
like:

https://gist.github.com/f2b9c8c76943b057546e

Which contain a lot of errors - although it appears the servers come up (i
have not tried to use the servers yet). Although I don't have the original
stack that caused the crash, the logs from recently after the crash
contained a lot of this stack:

2012-04-04 21:04:58,833 - INFO
[GarbageCollectorThread:GarbageCollectorThread@266] - Deleting entryLogId 4
as it has no active ledgers!
2012-04-04 21:04:58,834 - ERROR [GarbageCollectorThread:EntryLogger@188] -
Trying to delete an entryLog file that could not be found: 4.log
2012-04-04 21:04:59,783 - WARN  [NIOServerFactory-3181:NIOServerFactory@129]
- Exception in server socket loop: /0.0.0.0

java.util.NoSuchElementException
        at java.util.LinkedList.getFirst(LinkedList.java:109)
        at
org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:458)
        at
org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
        at
org.apache.bookkeeper.bookie.LedgerDescriptorImpl.addEntry(LedgerDescriptorImpl.java:93)
        at
org.apache.bookkeeper.bookie.Bookie.addEntryInternal(Bookie.java:999)
        at org.apache.bookkeeper.bookie.Bookie.addEntry(Bookie.java:1034)
        at
org.apache.bookkeeper.proto.BookieServer.processPacket(BookieServer.java:359)
        at
org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.readRequest(NIOServerFactory.java:315)
        at
org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.doIO(NIOServerFactory.java:213)
        at
org.apache.bookkeeper.proto.NIOServerFactory.run(NIOServerFactory.java:124)

This morning I upgraded to the most recent cut -
sha f694716e289c448ab89cab5fa81ea0946f9d9193 made on Tue Apr 3 16:02:44
2012 +0000 and restarted. That did not seem to correct matters, although
the log has slightly different error messages:

https://gist.github.com/aea874d89b28d4cfef31

Does anyone know whats going on? How i can correct these errors? Are the
machines in an okay state to use?

I really appreciate any advice/help.

-John Nagro

Mime
View raw message