hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: CheckPoint Node
Date Thu, 22 Nov 2012 18:42:00 GMT
Jean-Marc (Sorry if I've been spelling your name wrong),

0.94 does support Hadoop-2 already, and works pretty well with it, if
that is your only concern. You only need to use the right download (or
if you compile, use the -Dhadoop.profile=23 maven option).

You will need to restart the NameNode to make changes to the
dfs.name.dir property and set it into effect. A reasonably fast disk
is needed for quicker edit log writes (few bytes worth in each round)
but a large, or SSD-style disk is not a requisite. An external disk
would work fine too (instead of an NFS), as long as it is reliable.

You do not need to copy data manually - just ensure that your NameNode
process user owns the directory and it will auto-populate the empty
directory on startup.

Operationally speaking, in case 1/2 disk fails, the NN Web UI (and
metrics as well) will indicate this (see bottom of NN UI page for an
example of what am talking about) but the NN will continue to run with
the lone remaining disk, but its not a good idea to let it run for too
long without fixing/replacing the disk, for you will be losing out on

On Thu, Nov 22, 2012 at 11:59 PM, Jean-Marc Spaggiari
<jean-marc@spaggiari.org> wrote:
> Hi Harsh,
> Again, thanks a lot for all those details.
> I read the previous link and I totally understand the HA NameNode. I
> already have a zookeeper quorum (3 servers) that I will be able to
> re-use. However, I'm running HBase 0.94.2 which is not yet compatible
> (I think) with Hadoop 2.0.x. So I will have to go with a non-HA
> NameNode until I can migrate to a stable 0.96 HBase version.
> Can I "simply" add one directory to dfs.name.dir and restart
> my namenode? Is it going to feed all the required information in this
> directory? Or do I need to copy the data of the existing one in the
> new one before I restart it? Also, does it need a fast transfert rate?
> Or will an exteral hard drive (quick to be moved to another server if
> required) be enought?
> 2012/11/22, Harsh J <harsh@cloudera.com>:
>> Please follow the tips provided at
>> http://wiki.apache.org/hadoop/FAQ#How_do_I_set_up_a_hadoop_node_to_use_multiple_volumes.3Fand
>> http://wiki.apache.org/hadoop/FAQ#If_the_NameNode_loses_its_only_copy_of_the_fsimage_file.2C_can_the_file_system_be_recovered_from_the_DataNodes.3F
>> In short, if you use a non-HA NameNode setup:
>> - Yes the NN is a very vital persistence point in running HDFS and its
>> data should be redundantly stored for safety.
>> - You should, in production, configure your NameNode's image and edits
>> disk (dfs.name.dir in 1.x+, or dfs.namenode.name.dir in 0.23+/2.x+) to
>> be a dedicated one with adequate free space for gradual growth, and
>> should configure multiple disks (with one off-machine NFS point highly
>> recommended for easy recovery) for adequate redundancy.
>> If you instead use a HA NameNode setup (I'd highly recommend doing
>> this since it is now available), the presence of > 1 NameNodes and the
>> journal log mount or quorum setup would automatically act as
>> safeguards for the FS metadata.
>> On Thu, Nov 22, 2012 at 11:03 PM, Jean-Marc Spaggiari
>> <jean-marc@spaggiari.org> wrote:
>>> Hi Harsh,
>>> Thanks for pointing me to this link. I will take a close look at it.
>>> So with 1.x and 0.23.x, what's the impact on the data if the namenode
>>> server hard-drive die? Is there any critical data stored locally? Or I
>>> simply need to build a new namenode, start it and restart all my
>>> namenodes to find my data back?
>>> I can deal with my application not beeing available, but loosing data
>>> can be a bigger issue.
>>> Thanks,
>>> JM
>>> 2012/11/22, Harsh J <harsh@cloudera.com>:
>>>> Hey Jean,
>>>> The 1.x, 0.23.x release lines both don't have NameNode HA features.
>>>> The current 2.x releases carry HA-NN abilities, and this is documented
>>>> at
>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html.
>>>> On Thu, Nov 22, 2012 at 10:18 PM, Jean-Marc Spaggiari
>>>> <jean-marc@spaggiari.org> wrote:
>>>>> Replying to myself ;)
>>>>> By digging a bit more I figured that 1.0 version is older than 0.23.4
>>>>> version and that backupnodes are on 0.23.4. Secondarynamenodes on 1.0
>>>>> are now deprecated.
>>>>> I'm still a bit mixed up on the way to achieve HA for the namenode
>>>>> (1.0 or 0.23.4) but I will continue to dig over internet.
>>>>> JM
>>>>> 2012/11/22, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
>>>>>> Hi,
>>>>>> I'm reading a bit about hadoop and I'm trying to increase the HA
of my
>>>>>> current cluster.
>>>>>> Today I have 8 datanodes and one namenode.
>>>>>> By reading here: http://www.aosabook.org/en/hdfs.html I can see that
>>>>>> Checkpoint node might be a good idea.
>>>>>> So I'm trying to start a checkpoint node. I looked at the hadoop
>>>>>> online doc. There is a link toe describe the command usage "For
>>>>>> command usage, see namenode." but this link is not working. Also,
if I
>>>>>> try hadoop-deamon.sh start namenode -checkpoint as discribed in the
>>>>>> documentation, it's not starting.
>>>>>> So I'n wondering, is there anywhere where I can find up to date
>>>>>> documentation about the checkpoint node? I will most probably try
>>>>>> BackupNode.
>>>>>> I'm using hadoop 1.0.3. The options I have to start on this version
>>>>>> are namenode, secondarynamenode, datanode, dfsadmin, mradmin, fsck
>>>>>> fs. Should I start some secondarynamenodes instead of backupnode
>>>>>> checkpointnode?
>>>>>> Thanks,
>>>>>> JM
>>>> --
>>>> Harsh J
>> --
>> Harsh J

Harsh J

View raw message