hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yifan Du <duyifa...@gmail.com>
Subject Re: fsimage.ckpt are not deleted - Exception in doCheckpoint
Date Fri, 08 Mar 2013 10:08:09 GMT
I have met this exception too.
The new fsimage played by SNN could not be transfered to NN.
My hdfs version is 2.0.0.
did anyone know how to fix it?

@Regards Elmar
The new fsimage has been created successfully. But it could not be
transfered to NN,so the old fsimage.ckpt not deleted.
I have tried the new fsimage. Startup the cluster with the new fsimage
and new edits in progress. It's successfully and no data lost.


2013/3/6, Elmar Grote <elmar.grote@optivo.de>:
> Hi,
>
> we are writing our fsimage and edits file on the namenode and secondary
> namenode and additional on a nfs share.
>
> In these folders we found a a lot of fsimage.ckpt_000000000........
> . files, the oldest is from 9. Aug 2012.
> As far a i know these files should be deleted after the secondary namenodes
> creates the new fsimage file.
> I looked in our log files from the namenode and secondary namenode to see
> what happen at that time.
>
> As example i searched for this file:
> 20. Feb 04:02 fsimage.ckpt_0000000000726216952
>
> In the namenode log i found this:
> 2013-02-20 04:02:51,404 ERROR
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hdfs (auth:SIMPLE) cause:java.io.IOException: Input/output error
> 2013-02-20   04:02:51,409 WARN org.mortbay.log: /getimage:
> java.io.IOException:   GetImage failed. java.io.IOException: Input/output
> error
>
> In the secondary namenode i think this is the relevant part:
> 2013-02-20 04:01:16,554 INFO
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Image has not
> changed. Will not download image.
> 2013-02-20 04:01:16,554 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://s_namenode.domain.local:50070/getimage?getedit=1&startTxId=726172233&endTxId=726216952&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
> 2013-02-20 04:01:16,750 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> edits_0000000000726172233-0000000000726216952 size 6881797 bytes.
> 2013-02-20 04:01:16,750 INFO
> org.apache.hadoop.hdfs.server.namenode.Checkpointer: Checkpointer about to
> load edits from 1 stream(s).
> 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Reading
> /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_0000000000726172233-0000000000726216952
> expecting start txid #726172233
> 2013-02-20 04:01:16,987 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Edits file
> /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_0000000000726172233-0000000000726216952
> of size 6881797 edits # 44720 loaded in 0 seconds.
> 2013-02-20 04:01:18,023 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Saving image file
> /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage.ckpt_0000000000726216952
> using no compression
> 2013-02-20 04:01:18,031 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Saving image file
> /var/lib/hdfs_nfs_share/dfs/namesecondary/current/fsimage.ckpt_0000000000726216952
> using no compression
> 2013-02-20 04:01:40,854 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Image file of size 1211973003 saved in 22 seconds.
> 2013-02-20 04:01:50,762 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Image file of size 1211973003 saved in 32 seconds.
> 2013-02-20 04:01:50,770 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to
> retain 2 images with txid >= 726172232
> 2013-02-20 04:01:50,770 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging
> old image
> FSImageFile(file=/var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage_0000000000726121750,
> cpktTxId=0000000000726121750)
> 2013-02-20 04:01:51,000 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging
> old image
> FSImageFile(file=/var/lib/hdfs_nfs_share/dfs/namesecondary/current/fsimage_0000000000726121750,
> cpktTxId=0000000000726121750)
> 2013-02-20 04:01:51,379 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging logs
> older than 725172233
> 2013-02-20 04:01:51,381 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging logs
> older than 725172233
> 2013-02-20 04:01:51,400 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://s_namenode.domain.local:50070/getimage?putimage=1&txid=726216952&port=50090&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
> 2013-02-20 04:02:51,411 ERROR
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
> doCheckpoint
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException:
> Image transfer servlet at
> http://s_namenode.domain.local:50070/getimage?putimage=1&txid=726216952&port=50090&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
> failed with status code 410
> Response message:
> GetImage failed. java.io.IOException: Input/output error  at
> sun.nio.ch.FileChannelImpl.force0(Native Method)  at
> sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:358)  at
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:303)
>  at
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.downloadImageToStorage(TransferFsImage.java:75)
>  at
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:169)
>  at
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:111)
>  at java.security.AccessController.doPrivileged(Native Method)  at
> javax.security.auth.Subject.doAs(Subject.java:396)  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>  at
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:111)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)  at
> javax.servlet.http.HttpServlet.service(HttpServlet.java:820)  at
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)  at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>  at
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:947)
>  at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>  at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)  at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>  at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>  at org.mortbay.jetty.Server.handle(Server.java:326)  at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)  at
> org.mortbay.jetty.Htt
>     at
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:216)
>     at
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:126)
>     at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:478)
>     at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
>     at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
>     at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
>     at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
>     at java.lang.Thread.run(Thread.java:619)
> 2013-02-20 04:04:52,592 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://s_namenode.domain.local:50070/getimage?getimage=1&txid=726172232&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
> 2013-02-20 04:05:36,976 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> fsimage.ckpt_0000000000726172232 size 1212016242 bytes.
> 2013-02-20 04:05:37,595 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage:   Skipping download
> of remote edit log [726172233,726216952] since it   already is stored
> locally at
> /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_0000000000726172233-0000000000726216952
> 2013-02-20 04:05:37,595 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://s_namenode.domain.local:50070/getimage?getedit=1&startTxId=726216953&endTxId=726262269&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
> 2013-02-20 04:05:38,339 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> edits_0000000000726216953-0000000000726262269 size 7013503 bytes.
> 2013-02-20 04:05:38,339 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Loading image file
> /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage_0000000000726172232
> using no compression
> 2013-02-20 04:05:38,339 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Number of files = 8795086
> 2013-02-20 04:06:13,678 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Number of files under construction = 32
> 2013-02-20 04:06:13,679 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Image file of size 1212016242 loaded in 35 seconds.
> 2013-02-20 04:06:13,679 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Loaded image for txid 726172232 from
> /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage_0000000000726172232
> 2013-02-20 04:06:13,679 INFO
> org.apache.hadoop.hdfs.server.namenode.Checkpointer: Checkpointer about to
> load edits from 2 stream(s).
> 2013-02-20 04:06:13,679 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Reading
> /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_0000000000726172233-0000000000726216952
> expecting start txid #726172233
> 2013-02-20 04:06:14,038 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Edits file
> /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_0000000000726172233-0000000000726216952
> of size 6881797 edits # 44720 loaded in 0 seconds.
> 2013-02-20 04:06:14,038 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Reading
> /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_0000000000726216953-0000000000726262269
> expecting start txid #726216953
> 2013-02-20 04:06:14,372 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Edits file
> /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_0000000000726216953-0000000000726262269
> of size 7013503 edits # 45317 loaded in 0 seconds.
> 2013-02-20 04:06:15,285 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Saving image file
> /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage.ckpt_0000000000726262269
> using no compression
> 2013-02-20 04:06:15,289 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Saving image file
> /var/lib/hdfs_nfs_share/dfs/namesecondary/current/fsimage.ckpt_0000000000726262269
> using no compression
> 2013-02-20 04:06:38,530 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Image file of size 1212107279 saved in 23 seconds.
> 2013-02-20 04:06:45,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Image file of size 1212107279 saved in 30 seconds.
> 2013-02-20 04:06:45,406 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to
> retain 2 images with txid >= 726216952
> 2013-02-20 04:06:45,406 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging
> old image
> FSImageFile(file=/var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage_0000000000726172232,
> cpktTxId=0000000000726172232)
> 2013-02-20 04:06:45,646 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging
> old image
> FSImageFile(file=/var/lib/hdfs_nfs_share/dfs/namesecondary/current/fsimage_0000000000726172232,
> cpktTxId=0000000000726172232)
> 2013-02-20 04:06:46,115 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging logs
> older than 725216953
> 2013-02-20 04:06:46,118 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging logs
> older than 725216953
> 2013-02-20 04:06:46,145 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://s_namenode.domain.local:50070/getimage?putimage=1&txid=726262269&port=50090&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
> 2013-02-20 04:07:31,010 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
> txid 726262269 to namenode at s_namenode.domain.local:50070
> 2013-02-20 04:07:31,011 WARN
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done.
> New Image Size: 1212107279
> 2013-02-20 04:07:31,013 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to
> retain 2 images with txid >= 726216952
> 2013-02-20 04:07:31,013 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging logs
> older than 725216953
> 2013-02-20 04:07:31,013 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging logs
> older than 725216953
>
> At that time we are copying our whole data from this cluster to a second
> cluster. So we are reading a lot in this cluster.
> In   our monitoring graphs i can not see any peaks at the time this happens.
>   Only the secondary namenode takes more memory than usual, about 1G   more.
>
>
> What happen here? Was the creation of the new fsimage successful? If so why
> where the old fsimag.ckpt not deleted?
>
> Or did we lose some data?
>
> Regards Elmar

Mime
View raw message