hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mac fang <mac.had...@gmail.com>
Subject Re: Issue of FSImage, need help
Date Mon, 04 Jul 2011 05:07:11 GMT
Guys,

Any clues why the corrupted image could happen.

regards
macf

On Wed, Jun 29, 2011 at 9:11 AM, mac fang <mac.hadoop@gmail.com> wrote:

> HI, Todd,
>
> we use the 0.21 version. I think we used the 'kill -9'. The possible timing
> is when startup or checkpoint.
>
> regards
> macf
>
>
> On Tue, Jun 28, 2011 at 11:03 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> Hi Denny,
>>
>> Which version of Hadoop are you using, and when are you killing the
>> NameNode? Are you using a unix signal (eg kill -9) or killing power to the
>> whole machine?
>>
>> Thanks
>> -Todd
>>
>> On Tue, Jun 28, 2011 at 2:11 AM, Denny Ye <dennyy99@gmail.com> wrote:
>>
>> > *Root cause*: Wrong FSImage format when user killed hdfs process. It may
>> > read invalid block
>> > number, may be 1 billion or more, OutOfMemoryError happens before
>> > EOFException.
>> >
>> > How can we provide the validity of FSImage file?
>> >
>> > --regards
>> > Denny Ye
>> >
>> > On Tue, Jun 28, 2011 at 4:44 PM, mac fang <mac.hadoop@gmail.com> wrote:
>> >
>> > > Hi, Team,
>> > >
>> > > What we found when we use the Hadoop is, the FSImage often currupts
>> when
>> > we
>> > > do start/stop the Hadoop cluster. The reason we think might be around
>> the
>> > > write to the outputstream: the NameNode may be killed when it
>> > > saveNamespace,
>> > > then the FsImage file doesn't complete writing. Currently i saw a
>> > > previous.checkpoint folder, the logic of saveNamespace is like:
>> > >
>> > > 1. mv the current folder to the previous.checkpoint folder.
>> > > 2. start to write the FSImage into the current folder.
>> > >
>> > > I think there mightbe a case if the FSImage is currupted, the NameNode
>> > can
>> > > NOT be started, but we can NOT get any EOFException, since we might
>> > > encounter the OutofMemory exception if we read the wrong numBlocks and
>> > > instantiate the Blocks [] blocks = new Blocks[numBlocks] (actually, we
>> > face
>> > > this issue).
>> > >
>> > > Any suggestion to it?
>> > >
>> > > thanks
>> > > macf
>> > >
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message