hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hank Cohen <hank.co...@altior.com>
Subject RE: Changing where HDFS stores its data
Date Thu, 28 Jun 2012 14:25:59 GMT
I figured it out.
It turns out that one must not modify the configuration files while the cluster is running.
If you do the edits file can become corrupted.  Fortunately the corruption is in the first
word of the file which is a magic number and easily detected.
So the solution is to be sure that the cluster is stopped before modifying the configuration.

Is this a bug?  I always think that configurations are read at initialization time and then
not used again.
This behavior allows changes to take place when the service restarts, it's the way things
work with all sorts of U/Linux services.
Thanks for your help,
Hank Cohen

From: Harsh J [mailto:harsh@cloudera.com]
Sent: Thursday, June 28, 2012 5:03 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: Changing where HDFS stores its data


I'm able to run my HDFS with two different set of configs independently. Can you share your
whole NN log? One name/data directory should not conflict with another, but in any case, it
is always good to define dfs.name.dir and dfs.data.dir to the absolute paths instead of relying
on hadoop.tmp.dir's implicitness. What I do is keep two different config dirs and pass the
right one when needing to switch from the defaults.
On Thu, Jun 28, 2012 at 1:15 PM, Giulio D'Ippolito <giulio.dippolito@gmail.com<mailto:giulio.dippolito@gmail.com>>
You could manually edit the VERSION file in order to match the datanode and namenodes id's.

2012/6/27 Hank Cohen <hank.cohen@altior.com<mailto:hank.cohen@altior.com>>
[nit] First of all I think that the datanode storage location property should be simply dfs.data.dir
not dfs.datanode.data.dir (this from src/hdfs/hdfs-default.html)

Both the namenode storage directory and the datanode storage directory are defined relative
to hadoop.tmp.dir so simply changing that directory will change both of the subdirectories.
 But this doesn't allow me to change back and forth without errors.

I get an error when I try to change hadoop.tmp.dir to a directory that already contains a
hadoop file system.
The error is:
2012-06-27 10:40:44,144 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException:
Unexpected version of the file system log file: -333643776. Current version = -32.
[Does anyone want to see the java stack trace?]

When I look at the VERSION files (hadoop.tmp.dir/dfs/name/current/VERSION)
the only difference I see is that namespaceID is different.  I think namespaceID probably
should be different, it is a different file system.

Thanks for any guidance,
Hank Cohen

From: Konstantin Shvachko [mailto:shv.hadoop@gmail.com<mailto:shv.hadoop@gmail.com>]
Sent: Monday, June 18, 2012 5:12 PM
To: hdfs-user@hadoop.apache.org<mailto:hdfs-user@hadoop.apache.org>
Subject: Re: Changing where HDFS stores its data

In hdfs-site.xml you should specify
for NameNode stoareg directories or / and
for DataNode storage

Changing temporary directory location changes the default for storage directories.
Which should also work. You might want to check the message the NameNode loggs when it fails.

On Mon, Jun 18, 2012 at 3:47 PM, Hank Cohen <hank.cohen@altior.com<mailto:hank.cohen@altior.com>>
I am trying to do some testing with different storage configurations for HDFS but I am having
difficulty changing the storage destination without having to re-initialize the whole file
system each time I change things.

What I want to do: Set up and run some test cases with two different local file system configurations.
 Think of it as having different local disks with different performance characteristics.

What I have done so far it to change the xml in core-site.xml to change the hadoop.tmp.dir
property.  Let's call this dir1.
I can set this up and format the file system without any problems, run my tests, shut down
and change core-site.xml again to dir2.
Again I can format dir2 and run my tests OK but when I try to switch back to dir1 I can't
get the namenode to start.  I find that I have to remove all of the directories and subdirectories
from dir1 then reformat and start over with nothing in the file system.

Is there an easy way to do this without having to reinitialize the whole HDFS each time?

Hank Cohen

+1 732-440-1280 x320<tel:%2B1%20732-440-1280%20x320> Office
+1 510-995-8264<tel:%2B1%20510-995-8264>  Direct

444 Route 35 South
Building B
Eatontown, NJ 07724 USA


[Description: EmailBug]

Harsh J

View raw message