incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <eric...@gmail.com>
Subject Re: chukwa collector
Date Tue, 25 Oct 2011 15:58:11 GMT
Chukwa collector was attempting to rename a HDFS file handle which did not exist on namenode.
There are two possibilities for this exception to happen.

1. someone has deleted /chukwa/logs/2011102008_s0281.hostsud.net.chukwa, 
2. namenode was restarted but collector was not restarted, hence the file handle had mismatch.
3. There is a connection lost while communicate with namenode that it does not know about
/chukwa/logs/2011102008_s0281.hostsud.net.chukwa file. (Unlikely)

In the past, we bail out on any HDFS errors for Chukwa Collector.  We took out the logic to
do so for the trunk version of SeqFileWriter.java.  Hence, this bug is fixed in trunk.  I
would recommend to take a look of the trunk version.  It is more stable than Chukwa 0.4.

regards,
Eric

On Oct 25, 2011, at 1:46 AM, IvyTang wrote:

> Our chukwa collector crashed 。
> And the log showed
> 
> 
> 2011-10-20 04:08:27,847 WARN Timer-817 SeqFileWriter - Got an exception in rotate
> org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
No lease on /chukwa/logs/2011102008_s0281.hostsud.net.chukwa File does not exist. [Lease.
 Holder: DFSClient_395554495, pendingcreates: 1]
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1490)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1481)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1536)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1524)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:665)
>         at sun.reflect.GeneratedMethodAccessor1374.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1416)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1412)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1410)
> 
>         at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>         at $Proxy0.complete(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at $Proxy0.complete(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3558)
>         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3472)
>         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
>         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
>         at org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter.rotate(SeqFileWriter.java:199)
>         at org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter$1.run(SeqFileWriter.java:240)
>         at java.util.TimerThread.mainLoop(Timer.java:512)
>         at java.util.TimerThread.run(Timer.java:462)
> 2011-10-20 04:08:27,848 FATAL Timer-817 SeqFileWriter - IO Exception in rotate. Exiting!
> 2011-10-20 04:08:27,851 WARN Shutdown SeqFileWriter - cannot rename dataSink file:/chukwa/logs/2011102008_s0281.hostsud.net.chukwa
> java.io.IOException: Filesystem closed
>         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:232)
>         at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:606)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:224)
>         at org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter.close(SeqFileWriter.java:327)
>         at org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter.close(SocketTeeWriter.java:268)
>         at org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.close(PipelineStageWriter.java:46)
>         at org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.destroy(ServletCollector.java:227)
>         at org.mortbay.jetty.servlet.ServletHolder.destroyInstance(ServletHolder.java:315)
>         at org.mortbay.jetty.servlet.ServletHolder.doStop(ServletHolder.java:286)
>         at org.mortbay.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:64)
>         at org.mortbay.jetty.servlet.ServletHandler.doStop(ServletHandler.java:170)
>         at org.mortbay.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:64)
>         at org.mortbay.jetty.handler.HandlerWrapper.doStop(HandlerWrapper.java:142)
>         at org.mortbay.jetty.servlet.SessionHandler.doStop(SessionHandler.java:124)
>         at org.mortbay.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:64)
>         at org.mortbay.jetty.handler.HandlerWrapper.doStop(HandlerWrapper.java:142)
>         at org.mortbay.jetty.handler.ContextHandler.doStop(ContextHandler.java:569)
>         at org.mortbay.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:64)
>         at org.mortbay.jetty.handler.HandlerWrapper.doStop(HandlerWrapper.java:142)
>         at org.mortbay.jetty.Server.doStop(Server.java:281)
>         at org.mortbay.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:64)
>         at org.mortbay.jetty.Server$ShutdownHookThread.run(Server.java:559)
> 
> What does this mean?
> 
> -- 
> Best regards,
> 
> Ivy Tang
> 
> 
> 


Mime
View raw message