Hi All,
Thank you so much for your valuable solutions!
*Problem got resolved, but significant time+data loss*(since we were runnin=
g
on an experimental basis, reloaded fewer GB of the data). I used
-importCheckpoint option.
I just would like to tell you the possible scenario/reason of editlog
corruption might have happened(correct me if I am wrong),
Below were the typical configurations in hdfs-site.xml
- hadoop.tmp.dir : */opt/data*/tmp
- dfs.name.dir : */opt/data*/name
- dfs.data.dir : */opt/data*/data
- mapred.local.dir : ${hadoop.tmp.dir}/mapred/local
*/opt/data *is an mounted storage, size is 50GB. Namenode,
SecondaryNamenode( ${hadoop.tmp.dir}/dfs/namesecondary) & Datanode
directories were configured within */opt/data *itself.
Once I moved 3.6GB compressed(bz2) file, I guess */opt/data *memory usage o=
f
this dir. could have been 100%(I checked($df -h) after this incident). Then=
,
I ran Hive with simple* "Select" *query, its job.jar files also needs to be
created within the same directory which already has no space. So this is ho=
w
the editlog corruption could have been occurred.
This is really a good learning for me! Now I have changed that
configurations.
Thanks again,
Sakthivel
On Fri, Jul 15, 2011 at 4:47 PM, Brahma Reddy <brahmareddyb@huawei.com>wrot=
e:
> Hi,****
>
> ** **
>
> **1) **This can be achieved either by copying the relevant storage
> directory to a new name node ,****
>
> **2) **or, if the secondary is taking over as the new name node
> daemon .by using the =96import checkpoint option when starting the name n=
ode
> daemon. The =96importcheckopoint option will load the name node metadata =
from
> the latest checkpoint in the directory defined by the *fs.chekpoint.dir*p=
roperty, but only if there is no metadata in the dfs.name.dir,so there is
> no risk of overwriting precious data****
>
> ****
>
> Regards****
>
> Brahma Reddy****
>
> ** **
>
>
> *************************************************************************=
**************
> This e-mail and attachments contain confidential information from HUAWEI,
> which is intended only for the person or entity whose address is listed
> above. Any use of the information contained herein in any way (including,
> but not limited to, total or partial disclosure, reproduction, or
> dissemination) by persons other than the intended recipient's) is
> prohibited. If you receive this e-mail in error, please notify the sender=
by
> phone or email immediately and delete it!****
> ------------------------------
>
> *From:* Sakthivel Murugasamy [mailto:sakthiinfotec@gmail.com]
> *Sent:* Friday, July 15, 2011 2:40 PM
>
> *To:* hdfs-user@hadoop.apache.org
> *Subject:* Re: Namenode not get started. Reason: FSNamesystem
> initialization failed.****
>
> ** **
>
> Dear Team,
>
> I have loaded 3.6GB of compressed(bz2) directly into Hive, after that I r=
an
> a simple "*select query*", namenode got crashed.
> There after not able to start namenode.
>
> Envoronment: ****
>
> - CentOS release 5.5 (Final), Hadoop Version: 0.20.2****
> - Cluster size: 18 node ****
> - NameNode & SecondaryNamenode are in the same Machine****
>
> It seems editlogs/fsimage got corrupted, I haven't take any backup
> separately, below is the exception
>
> 2011-07-14 23:37:43,378 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> initialization failed.
> java.io.FileNotFoundException: File does not exist:
> /opt/data/tmp/mapred/system/job_201107041958_0120/j^@^@^@^@^@^@
>
> *Please find detailed exception in namenode's log file attached.*
>
> Earlier, I have also posted in JIRA,
> https://issues.apache.org/jira/browse/HADOOP-7458 , Jakob Homan directed
> me to post in hdfs user's list.
>
> Will there be any backup in SecondaryNamenode? Could you please assist me
> to recover Namenode from this issue?
>
>
> Thanks,
> Sakthivel ****
>
|