chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Graham (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CHUKWA-487) Collector left in a bad state after temprorary NN outage
Date Mon, 10 May 2010 20:50:31 GMT

    [ https://issues.apache.org/jira/browse/CHUKWA-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865918#action_12865918
] 

Bill Graham commented on CHUKWA-487:
------------------------------------

Here's what I saw in the logs when I had to restart my NN. It took a little while to exit
safe mode. I had to restore from he secondary name node so there might have been some data
loss upon restore.

131122010-05-06 17:32:19,515 INFO Timer-3 SeqFileWriter - stat:datacollection.writer.hdfs
dataSize=318716 dataRate=10622
2010-05-06 17:32:49,518 INFO Timer-3 SeqFileWriter - stat:datacollection.writer.hdfs dataSize=196741
dataRate=6557
2010-05-06 17:33:06,367 INFO Timer-1 root - stats:ServletCollector,numberHTTPConnection:129,numberchunks:217
2010-05-06 17:33:19,521 INFO Timer-3 SeqFileWriter - stat:datacollection.writer.hdfs dataSize=0
dataRate=0
2010-05-06 17:33:49,523 INFO Timer-3 SeqFileWriter - stat:datacollection.writer.hdfs dataSize=0
dataRate=0
2010-05-06 17:34:01,142 WARN org.apache.hadoop.dfs.DFSClient$LeaseChecker@36b60b93 DFSClient
- Problem renewing lease for DFSClient_-10
88933168: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.SafeModeException:
Cannot renew lease for DFSClient_-1088933168.
 Name node is in safe mode.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode will be
turned off automatically.
        at org.apache.hadoop.dfs.FSNamesystem.renewLease(FSNamesystem.java:1823)
        at org.apache.hadoop.dfs.NameNode.renewLease(NameNode.java:458)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
        at org.apache.hadoop.ipc.Client.call(Client.java:716)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at org.apache.hadoop.dfs.$Proxy0.renewLease(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy0.renewLease(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:781)
        at java.lang.Thread.run(Thread.java:619)

2010-05-06 17:34:01,608 WARN Timer-2094 SeqFileWriter - Got an exception in rotate
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.SafeModeException: Cannot complete
file /chukwa/logs/201006172737418_xxxxxxxxxcom_71ea99261284ab9f0566faa.chukwa. Name node is
in safe mode.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode will be
turned off automatically.
        at org.apache.hadoop.dfs.FSNamesystem.completeFileInternal(FSNamesystem.java:1209)
        at org.apache.hadoop.dfs.FSNamesystem.completeFile(FSNamesystem.java:1200)
        at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:351)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
        at org.apache.hadoop.ipc.Client.call(Client.java:716)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at org.apache.hadoop.dfs.$Proxy0.complete(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy0.complete(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2736)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:2657)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59)
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79)
        at org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter.rotate(SeqFileWriter.java:194)
        at org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter$1.run(SeqFileWriter.java:235)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)
2010-05-06 17:34:01,647 FATAL Timer-2094 SeqFileWriter - IO Exception in rotate. Exiting!
2010-05-06 17:34:01,661 FATAL btpool0-6248 SeqFileWriter - IOException when trying to write
a chunk, Collector is going to exit!
java.io.IOException: Stream closed.
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.isClosed(DFSClient.java:2245)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2481)
        at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
        at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
        at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
        at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1016)
        at org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter.add(SeqFileWriter.java:281)
        at org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.accept(ServletCollector.java:152)
        at org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.doPost(ServletCollector.java:190)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:324)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:843)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:647)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
        at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:450)
2010-05-06 17:34:06,370 INFO Timer-1 root - stats:ServletCollector,numberHTTPConnection:28,numberchunks:0
2010-05-06 17:35:06,375 INFO Timer-1 root - stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
2010-05-06 17:36:06,379 INFO Timer-1 root - stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
2010-05-06 17:37:06,384 INFO Timer-1 root - stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
...

> Collector left in a bad state after temprorary NN outage
> --------------------------------------------------------
>
>                 Key: CHUKWA-487
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-487
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>    Affects Versions: 0.4.0
>            Reporter: Bill Graham
>
> When the name node returns errors to the collector, at some point the collector dies
half way. This behavior should be changed to either resemble the agents and keep trying, or
to completely shutdown. Instead, what I'm seeing is that the collector logs that it's shutting
down, and the var/pidDir/Collector.pid file gets removed, but the collector continues to run,
albeit not handling new data. Instead, this log entry is repeated ad infinitum:
> 2010-05-06 17:35:06,375 INFO Timer-1 root - stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
> 2010-05-06 17:36:06,379 INFO Timer-1 root - stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
> 2010-05-06 17:37:06,384 INFO Timer-1 root - stats:ServletCollector,numberHTTPConnection:0,numberchunks:0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message