Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4DD1BE334 for ; Fri, 8 Mar 2013 13:44:05 +0000 (UTC) Received: (qmail 8745 invoked by uid 500); 8 Mar 2013 13:44:00 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 8222 invoked by uid 500); 8 Mar 2013 13:43:59 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 8196 invoked by uid 99); 8 Mar 2013 13:43:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Mar 2013 13:43:58 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of elmar.grote@optivo.de designates 213.61.69.122 as permitted sender) Received: from [213.61.69.122] (HELO office.optivo.de) (213.61.69.122) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Mar 2013 13:43:53 +0000 X-Footer: b3B0aXZvLmRl Received: from localhost ([127.0.0.1]) by office.optivo.de for user@hadoop.apache.org; Fri, 8 Mar 2013 14:43:29 +0100 From: "Elmar Grote" Subject: Re: fsimage.ckpt are not deleted - Exception in doCheckpoint To: user@hadoop.apache.org In-Reply-To: Message-ID: <20130308134329.1dcf1889@office.optivo.de> Date: Fri, 08 Mar 2013 14:43:29 +0100 X-User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:18.0) Gecko/20100101 Firefox/18.0 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-----------29de7b504aa56e742d23fece76aca6f2" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. -------------29de7b504aa56e742d23fece76aca6f2 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Yifan, thank you for the answer. But as far as i understand the SN downloads the fsimage and edits files = from NN, build the new fsimage in uploads it to the NN. So here the upload didn't work. The next time the creation starts there = is the old fsimage on the NN. But what about the edits files =3F Are the old ones still there=3F Or wh= ere they deleted during the not working upload of the fsimage=3F If they where deleted th= e are missing and=20 there should be a loss or inconsistence of data. Or am i wrong=3F When will the edits files be deleted=3F After a successful upload or bef= ore=3F Regards Elmar =5F=5F=5F=5F=5F =20 From: Yifan Du [mailto:duyifan23@gmail.com] To: user@hadoop.apache.org Sent: Fri, 08 Mar 2013 11:08:09 +0100 Subject: Re: fsimage.ckpt are not deleted - Exception in doCheckpoint I have met this exception too. The new fsimage played by SNN could not be transfered to NN. My hdfs version is 2.0.0. did anyone know how to fix it=3F =20 @Regards Elmar The new fsimage has been created successfully. But it could not be transfered to NN,so the old fsimage.ckpt not deleted. I have tried the new fsimage. Startup the cluster with the new fsimage and new edits in progress. It's successfully and no data lost. =20 =20 2013/3/6, Elmar Grote : > Hi, > > we are writing our fsimage and edits file on the namenode and second= ary > namenode and additional on a nfs share. > > In these folders we found a a lot of fsimage.ckpt=5F000000000.......= . > . files, the oldest is from 9. Aug 2012. > As far a i know these files should be deleted after the secondary na= menodes > creates the new fsimage file. > I looked in our log files from the namenode and secondary namenode t= o see > what happen at that time. > > As example i searched for this file: > 20. Feb 04:02 fsimage.ckpt=5F0000000000726216952 > > In the namenode log i found this: > 2013-02-20 04:02:51,404 ERROR > org.apache.hadoop.security.UserGroupInformation: PriviledgedActionEx= ception > as:hdfs (auth:SIMPLE) cause:java.io.IOException: Input/output error > 2013-02-20 04:02:51,409 WARN org.mortbay.log: /getimage: > java.io.IOException: GetImage failed. java.io.IOException: Input/o= utput > error > > In the secondary namenode i think this is the relevant part: > 2013-02-20 04:01:16,554 INFO > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Image has = not > changed. Will not download image. > 2013-02-20 04:01:16,554 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening conn= ection > to > http://s=5Fnamenode.domain.local:50070/getimage=3Fgetedit=3D1&startT= xId=3D726172233&endTxId=3D726216952&storageInfo=3D-40:1814856193:1341996= 094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4 > 2013-02-20 04:01:16,750 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded f= ile > edits=5F0000000000726172233-0000000000726216952 size 6881797 bytes. > 2013-02-20 04:01:16,750 INFO > org.apache.hadoop.hdfs.server.namenode.Checkpointer: Checkpointer ab= out to > load edits from 1 stream(s). > 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Reading > /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F0000= 000000726172233-0000000000726216952 > expecting start txid #726172233 > 2013-02-20 04:01:16,987 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Edits file > /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F0000= 000000726172233-0000000000726216952 > of size 6881797 edits # 44720 loaded in 0 seconds. > 2013-02-20 04:01:18,023 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Saving image file > /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/fsimage.ckpt= =5F0000000000726216952 > using no compression > 2013-02-20 04:01:18,031 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Saving image file > /var/lib/hdfs=5Fnfs=5Fshare/dfs/namesecondary/current/fsimage.ckpt= =5F0000000000726216952 > using no compression > 2013-02-20 04:01:40,854 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Image file of size 1211973003 saved in 22 seconds. > 2013-02-20 04:01:50,762 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Image file of size 1211973003 saved in 32 seconds. > 2013-02-20 04:01:50,770 INFO > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Go= ing to > retain 2 images with txid >=3D 726172232 > 2013-02-20 04:01:50,770 INFO > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Pu= rging > old image > FSImageFile(file=3D/var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/c= urrent/fsimage=5F0000000000726121750, > cpktTxId=3D0000000000726121750) > 2013-02-20 04:01:51,000 INFO > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Pu= rging > old image > FSImageFile(file=3D/var/lib/hdfs=5Fnfs=5Fshare/dfs/namesecondary/cur= rent/fsimage=5F0000000000726121750, > cpktTxId=3D0000000000726121750) > 2013-02-20 04:01:51,379 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging l= ogs > older than 725172233 > 2013-02-20 04:01:51,381 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging l= ogs > older than 725172233 > 2013-02-20 04:01:51,400 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening conn= ection > to > http://s=5Fnamenode.domain.local:50070/getimage=3Fputimage=3D1&txid= =3D726216952&port=3D50090&storageInfo=3D-40:1814856193:1341996094997:CID= -064c4e47-387d-454d-aa1e-27cec1e816e4 > 2013-02-20 04:02:51,411 ERROR > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception = in > doCheckpoint > org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailed= Exception: > Image transfer servlet at > http://s=5Fnamenode.domain.local:50070/getimage=3Fputimage=3D1&txid= =3D726216952&port=3D50090&storageInfo=3D-40:1814856193:1341996094997:CID= -064c4e47-387d-454d-aa1e-27cec1e816e4 > failed with status code 410 > Response message: > GetImage failed. java.io.IOException: Input/output error at > sun.nio.ch.FileChannelImpl.force0(Native Method) at > sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:358) at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient= (TransferFsImage.java:303) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.downloadImage= ToStorage(TransferFsImage.java:75) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImag= eServlet.java:169) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImag= eServlet.java:111) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform= ation.java:1232) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImag= eServlet.java:111) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at > javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:51= 1) at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servle= tHandler.java:1221) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpSe= rver.java:947) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servle= tHandler.java:1212) > at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.j= ava:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.ja= va:216) > at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.ja= va:182) > at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.ja= va:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:= 450) at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHan= dlerCollection.java:230) > at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.j= ava:152) > at org.mortbay.jetty.Server.handle(Server.java:326) at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:5= 42) at > org.mortbay.jetty.Htt > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient= (TransferFsImage.java:216) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFr= omStorage(TransferFsImage.java:126) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoin= t(SecondaryNameNode.java:478) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(Seco= ndaryNameNode.java:334) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(Secon= daryNameNode.java:301) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(Securit= yUtil.java:438) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(Seconda= ryNameNode.java:297) > at java.lang.Thread.run(Thread.java:619) > 2013-02-20 04:04:52,592 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening conn= ection > to > http://s=5Fnamenode.domain.local:50070/getimage=3Fgetimage=3D1&txid= =3D726172232&storageInfo=3D-40:1814856193:1341996094997:CID-064c4e47-387= d-454d-aa1e-27cec1e816e4 > 2013-02-20 04:05:36,976 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded f= ile > fsimage.ckpt=5F0000000000726172232 size 1212016242 bytes. > 2013-02-20 04:05:37,595 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Skipping d= ownload > of remote edit log [726172233,726216952] since it already is store= d > locally at > /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F0000= 000000726172233-0000000000726216952 > 2013-02-20 04:05:37,595 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening conn= ection > to > http://s=5Fnamenode.domain.local:50070/getimage=3Fgetedit=3D1&startT= xId=3D726216953&endTxId=3D726262269&storageInfo=3D-40:1814856193:1341996= 094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4 > 2013-02-20 04:05:38,339 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded f= ile > edits=5F0000000000726216953-0000000000726262269 size 7013503 bytes. > 2013-02-20 04:05:38,339 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Loading image file > /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/fsimage=5F00= 00000000726172232 > using no compression > 2013-02-20 04:05:38,339 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Number of files =3D 8795086 > 2013-02-20 04:06:13,678 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Number of files under construction =3D 32 > 2013-02-20 04:06:13,679 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Image file of size 1212016242 loaded in 35 seconds. > 2013-02-20 04:06:13,679 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Loaded image for txid 726172232 from > /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/fsimage=5F00= 00000000726172232 > 2013-02-20 04:06:13,679 INFO > org.apache.hadoop.hdfs.server.namenode.Checkpointer: Checkpointer ab= out to > load edits from 2 stream(s). > 2013-02-20 04:06:13,679 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Reading > /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F0000= 000000726172233-0000000000726216952 > expecting start txid #726172233 > 2013-02-20 04:06:14,038 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Edits file > /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F0000= 000000726172233-0000000000726216952 > of size 6881797 edits # 44720 loaded in 0 seconds. > 2013-02-20 04:06:14,038 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Reading > /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F0000= 000000726216953-0000000000726262269 > expecting start txid #726216953 > 2013-02-20 04:06:14,372 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Edits file > /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F0000= 000000726216953-0000000000726262269 > of size 7013503 edits # 45317 loaded in 0 seconds. > 2013-02-20 04:06:15,285 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Saving image file > /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/fsimage.ckpt= =5F0000000000726262269 > using no compression > 2013-02-20 04:06:15,289 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Saving image file > /var/lib/hdfs=5Fnfs=5Fshare/dfs/namesecondary/current/fsimage.ckpt= =5F0000000000726262269 > using no compression > 2013-02-20 04:06:38,530 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Image file of size 1212107279 saved in 23 seconds. > 2013-02-20 04:06:45,380 INFO org.apache.hadoop.hdfs.server.namenode.= FSImage: > Image file of size 1212107279 saved in 30 seconds. > 2013-02-20 04:06:45,406 INFO > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Go= ing to > retain 2 images with txid >=3D 726216952 > 2013-02-20 04:06:45,406 INFO > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Pu= rging > old image > FSImageFile(file=3D/var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/c= urrent/fsimage=5F0000000000726172232, > cpktTxId=3D0000000000726172232) > 2013-02-20 04:06:45,646 INFO > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Pu= rging > old image > FSImageFile(file=3D/var/lib/hdfs=5Fnfs=5Fshare/dfs/namesecondary/cur= rent/fsimage=5F0000000000726172232, > cpktTxId=3D0000000000726172232) > 2013-02-20 04:06:46,115 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging l= ogs > older than 725216953 > 2013-02-20 04:06:46,118 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging l= ogs > older than 725216953 > 2013-02-20 04:06:46,145 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening conn= ection > to > http://s=5Fnamenode.domain.local:50070/getimage=3Fputimage=3D1&txid= =3D726262269&port=3D50090&storageInfo=3D-40:1814856193:1341996094997:CID= -064c4e47-387d-454d-aa1e-27cec1e816e4 > 2013-02-20 04:07:31,010 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded ima= ge with > txid 726262269 to namenode at s=5Fnamenode.domain.local:50070 > 2013-02-20 04:07:31,011 WARN > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint= done. > New Image Size: 1212107279 > 2013-02-20 04:07:31,013 INFO > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Go= ing to > retain 2 images with txid >=3D 726216952 > 2013-02-20 04:07:31,013 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging l= ogs > older than 725216953 > 2013-02-20 04:07:31,013 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging l= ogs > older than 725216953 > > At that time we are copying our whole data from this cluster to a se= cond > cluster. So we are reading a lot in this cluster. > In our monitoring graphs i can not see any peaks at the time this = happens. > Only the secondary namenode takes more memory than usual, about 1G= more. > > > What happen here=3F Was the creation of the new fsimage successful= =3F If so why > where the old fsimag.ckpt not deleted=3F > > Or did we lose some data=3F > > Regards Elmar =20 -------------29de7b504aa56e742d23fece76aca6f2 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Yifan,

thank you for the answer.

But as far as i unders= tand the SN downloads the fsimage and edits files from NN,
build the = new fsimage in uploads it to the NN.

So here the upload didn't wo= rk. The next time the creation starts there is the old fsimage on the NN= .
But what about the edits files =3F Are the old ones still there=3F = Or where they deleted
during the not working upload of the fsimage=3F= If they where deleted the are missing and
there should be a loss or= inconsistence of data.

Or am i wrong=3F

When will the edi= ts files be deleted=3F After a successful upload or before=3F

Reg= ards Elmar


From: = Yifan Du [mailto:duyifan23@gmail.com]
To: user@hadoop.apache.o= rg
Sent: Fri, 08 Mar 2013 11:08:09 +0100
Subject: Re= : fsimage.ckpt are not deleted - Exception in doCheckpoint

I have= met this exception too.
The new fsimage played by SNN could not be transfered to NN.
My hdfs version is 2.0.0.
did anyone know how to fix it=3F

@Regards Elmar
The new fsimage has been created successfully. But it could not be
transfered to NN,so the old fsimage.ckpt not deleted.
I have tried the new fsimage. Startup the cluster with the new fsimage and new edits in progress. It's successfully and no data lost.


2013/3/6, Elmar Grote <elmar= .grote@optivo.de>:
> Hi,
>
> we are writing our fsimage and edits file on the namenode and secon= dary
> namenode and additional on a nfs share.
>
> In these folders we found a a lot of fsimage.ckpt=5F000000000......= ..
> . files, the oldest is from 9. Aug 2012.
> As far a i know these files should be deleted after the secondary n= amenodes
> creates the new fsimage file.
> I looked in our log files from the namenode and secondary namenode = to see
> what happen at that time.
>
> As example i searched for this file:
> 20. Feb 04:02 fsimage.ckpt=5F0000000000726216952
>
> In the namenode log i found this:
> 2013-02-20 04:02:51,404 ERROR
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionE= xception
> as:hdfs (auth:SIMPLE) cause:java.io.IOException: Input/output error=
> 2013-02-20 04:02:51,409 WARN org.mortbay.log: /getimage:
> java.io.IOException: GetImage failed. java.io.IOException: Input/= output
> error
>
> In the secondary namenode i think this is the relevant part:
> 2013-02-20 04:01:16,554 INFO
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Image has= not
> changed. Will not download image.
> 2013-02-20 04:01:16,554 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening con= nection
> to
> http://s=5Fnamenode.domain.local:50070/getimage=3F= getedit=3D1&startTxId=3D726172233&endTxId=3D726216952&storag= eInfo=3D-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e= 816e4
> 2013-02-20 04:01:16,750 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded = file
> edits=5F0000000000726172233-0000000000726216952 size 6881797 bytes.=
> 2013-02-20 04:01:16,750 INFO
> org.apache.hadoop.hdfs.server.namenode.Checkpointer: Checkpointer a= bout to
> load edits from 1 stream(s).
> 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Reading
> /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F000= 0000000726172233-0000000000726216952
> expecting start txid #726172233
> 2013-02-20 04:01:16,987 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Edits file
> /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F000= 0000000726172233-0000000000726216952
> of size 6881797 edits # 44720 loaded in 0 seconds.
> 2013-02-20 04:01:18,023 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Saving image file
> /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/fsimage.ckp= t=5F0000000000726216952
> using no compression
> 2013-02-20 04:01:18,031 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Saving image file
> /var/lib/hdfs=5Fnfs=5Fshare/dfs/namesecondary/current/fsimage.ckpt= =5F0000000000726216952
> using no compression
> 2013-02-20 04:01:40,854 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Image file of size 1211973003 saved in 22 seconds.
> 2013-02-20 04:01:50,762 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Image file of size 1211973003 saved in 32 seconds.
> 2013-02-20 04:01:50,770 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: G= oing to
> retain 2 images with txid >=3D 726172232
> 2013-02-20 04:01:50,770 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: P= urging
> old image
> FSImageFile(file=3D/var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/= current/fsimage=5F0000000000726121750,
> cpktTxId=3D0000000000726121750)
> 2013-02-20 04:01:51,000 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: P= urging
> old image
> FSImageFile(file=3D/var/lib/hdfs=5Fnfs=5Fshare/dfs/namesecondary/cu= rrent/fsimage=5F0000000000726121750,
> cpktTxId=3D0000000000726121750)
> 2013-02-20 04:01:51,379 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging = logs
> older than 725172233
> 2013-02-20 04:01:51,381 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging = logs
> older than 725172233
> 2013-02-20 04:01:51,400 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening con= nection
> to
> http://s=5Fnamenode.domain.local:50070/getimage=3Fputimage=3D1= &txid=3D726216952&port=3D50090&storageInfo=3D-40:1814856193:= 1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
> 2013-02-20 04:02:51,411 ERROR
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception= in
> doCheckpoint
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFaile= dException:
> Image transfer servlet at
> http://s=5Fnamenode.domain.local:50070/getimage=3Fputimage=3D1= &txid=3D726216952&port=3D50090&storageInfo=3D-40:1814856193:= 1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
> failed with status code 410
> Response message:
> GetImage failed. java.io.IOException: Input/output error at
> sun.nio.ch.FileChannelImpl.force0(Native Method) at
> sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:358) at
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClien= t(TransferFsImage.java:303)
> at
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.downloadImag= eToStorage(TransferFsImage.java:75)
> at
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetIma= geServlet.java:169)
> at
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetIma= geServlet.java:111)
> at java.security.AccessController.doPrivileged(Native Method) at<= br> > javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfor= mation.java:1232)
> at
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetIma= geServlet.java:111)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) a= t
> javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:5= 11) at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servl= etHandler.java:1221)
> at
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpS= erver.java:947)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servl= etHandler.java:1212)
> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.= java:399)
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.j= ava:216)
> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.j= ava:182)
> at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.j= ava:766)
> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java= :450) at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHa= ndlerCollection.java:230)
> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.= java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326) at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:= 542) at
> org.mortbay.jetty.Htt
> at
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClien= t(TransferFsImage.java:216)
> at
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageF= romStorage(TransferFsImage.java:126)
> at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoi= nt(SecondaryNameNode.java:478)
> at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(Sec= ondaryNameNode.java:334)
> at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(Seco= ndaryNameNode.java:301)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(Securi= tyUtil.java:438)
> at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(Second= aryNameNode.java:297)
> at java.lang.Thread.run(Thread.java:619)
> 2013-02-20 04:04:52,592 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening con= nection
> to
> http://= s=5Fnamenode.domain.local:50070/getimage=3Fgetimage=3D1&txid=3D72617= 2232&storageInfo=3D-40:1814856193:1341996094997:CID-064c4e47-387d-45= 4d-aa1e-27cec1e816e4
> 2013-02-20 04:05:36,976 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded = file
> fsimage.ckpt=5F0000000000726172232 size 1212016242 bytes.
> 2013-02-20 04:05:37,595 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Skipping = download
> of remote edit log [726172233,726216952] since it already is stor= ed
> locally at
> /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F000= 0000000726172233-0000000000726216952
> 2013-02-20 04:05:37,595 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening con= nection
> to
> http://s=5Fnamenode.domain.local:50070/getimage=3F= getedit=3D1&startTxId=3D726216953&endTxId=3D726262269&storag= eInfo=3D-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e= 816e4
> 2013-02-20 04:05:38,339 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded = file
> edits=5F0000000000726216953-0000000000726262269 size 7013503 bytes.=
> 2013-02-20 04:05:38,339 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Loading image file
> /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/fsimage=5F0= 000000000726172232
> using no compression
> 2013-02-20 04:05:38,339 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Number of files =3D 8795086
> 2013-02-20 04:06:13,678 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Number of files under construction =3D 32
> 2013-02-20 04:06:13,679 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Image file of size 1212016242 loaded in 35 seconds.
> 2013-02-20 04:06:13,679 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Loaded image for txid 726172232 from
> /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/fsimage=5F0= 000000000726172232
> 2013-02-20 04:06:13,679 INFO
> org.apache.hadoop.hdfs.server.namenode.Checkpointer: Checkpointer a= bout to
> load edits from 2 stream(s).
> 2013-02-20 04:06:13,679 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Reading
> /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F000= 0000000726172233-0000000000726216952
> expecting start txid #726172233
> 2013-02-20 04:06:14,038 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Edits file
> /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F000= 0000000726172233-0000000000726216952
> of size 6881797 edits # 44720 loaded in 0 seconds.
> 2013-02-20 04:06:14,038 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Reading
> /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F000= 0000000726216953-0000000000726262269
> expecting start txid #726216953
> 2013-02-20 04:06:14,372 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Edits file
> /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/edits=5F000= 0000000726216953-0000000000726262269
> of size 7013503 edits # 45317 loaded in 0 seconds.
> 2013-02-20 04:06:15,285 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Saving image file
> /var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/current/fsimage.ckp= t=5F0000000000726262269
> using no compression
> 2013-02-20 04:06:15,289 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Saving image file
> /var/lib/hdfs=5Fnfs=5Fshare/dfs/namesecondary/current/fsimage.ckpt= =5F0000000000726262269
> using no compression
> 2013-02-20 04:06:38,530 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Image file of size 1212107279 saved in 23 seconds.
> 2013-02-20 04:06:45,380 INFO org.apache.hadoop.hdfs.server.namenode= .FSImage:
> Image file of size 1212107279 saved in 30 seconds.
> 2013-02-20 04:06:45,406 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: G= oing to
> retain 2 images with txid >=3D 726216952
> 2013-02-20 04:06:45,406 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: P= urging
> old image
> FSImageFile(file=3D/var/lib/hdfs=5Fnamenode/meta/dfs/namesecondary/= current/fsimage=5F0000000000726172232,
> cpktTxId=3D0000000000726172232)
> 2013-02-20 04:06:45,646 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: P= urging
> old image
> FSImageFile(file=3D/var/lib/hdfs=5Fnfs=5Fshare/dfs/namesecondary/cu= rrent/fsimage=5F0000000000726172232,
> cpktTxId=3D0000000000726172232)
> 2013-02-20 04:06:46,115 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging = logs
> older than 725216953
> 2013-02-20 04:06:46,118 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging = logs
> older than 725216953
> 2013-02-20 04:06:46,145 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening con= nection
> to
> http://s=5Fnamenode.domain.local:50070/getimage=3Fputimage=3D1= &txid=3D726262269&port=3D50090&storageInfo=3D-40:1814856193:= 1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
> 2013-02-20 04:07:31,010 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded im= age with
> txid 726262269 to namenode at s=5Fnamenode.domain.local:50070
> 2013-02-20 04:07:31,011 WARN
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoin= t done.
> New Image Size: 1212107279
> 2013-02-20 04:07:31,013 INFO
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: G= oing to
> retain 2 images with txid >=3D 726216952
> 2013-02-20 04:07:31,013 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging = logs
> older than 725216953
> 2013-02-20 04:07:31,013 INFO
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging = logs
> older than 725216953
>
> At that time we are copying our whole data from this cluster to a s= econd
> cluster. So we are reading a lot in this cluster.
> In our monitoring graphs i can not see any peaks at the time this= happens.
> Only the secondary namenode takes more memory than usual, about 1= G more.
>
>
> What happen here=3F Was the creation of the new fsimage successful= =3F If so why
> where the old fsimag.ckpt not deleted=3F
>
> Or did we lose some data=3F
>
> Regards Elmar
-------------29de7b504aa56e742d23fece76aca6f2--