hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Issue of FSImage, need help
Date Tue, 28 Jun 2011 15:03:26 GMT
Hi Denny,

Which version of Hadoop are you using, and when are you killing the
NameNode? Are you using a unix signal (eg kill -9) or killing power to the
whole machine?

Thanks
-Todd

On Tue, Jun 28, 2011 at 2:11 AM, Denny Ye <dennyy99@gmail.com> wrote:

> *Root cause*: Wrong FSImage format when user killed hdfs process. It may
> read invalid block
> number, may be 1 billion or more, OutOfMemoryError happens before
> EOFException.
>
> How can we provide the validity of FSImage file?
>
> --regards
> Denny Ye
>
> On Tue, Jun 28, 2011 at 4:44 PM, mac fang <mac.hadoop@gmail.com> wrote:
>
> > Hi, Team,
> >
> > What we found when we use the Hadoop is, the FSImage often currupts when
> we
> > do start/stop the Hadoop cluster. The reason we think might be around the
> > write to the outputstream: the NameNode may be killed when it
> > saveNamespace,
> > then the FsImage file doesn't complete writing. Currently i saw a
> > previous.checkpoint folder, the logic of saveNamespace is like:
> >
> > 1. mv the current folder to the previous.checkpoint folder.
> > 2. start to write the FSImage into the current folder.
> >
> > I think there mightbe a case if the FSImage is currupted, the NameNode
> can
> > NOT be started, but we can NOT get any EOFException, since we might
> > encounter the OutofMemory exception if we read the wrong numBlocks and
> > instantiate the Blocks [] blocks = new Blocks[numBlocks] (actually, we
> face
> > this issue).
> >
> > Any suggestion to it?
> >
> > thanks
> > macf
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message