hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praveen Yarlagadda <praveen.yarlaga...@gmail.com>
Subject Re: Best practices - Large Hadoop Cluster
Date Wed, 11 Aug 2010 00:21:53 GMT
Raj,

If you use cluster ssh, you can do it quickly. You can log onto several
hosts at a time and then execute commands on all of them at a time. I have
used it to manage 64 nodes.

Thanks,
Praveen



On Tue, Aug 10, 2010 at 4:55 PM, Michael Segel <michael_segel@hotmail.com>wrote:

>
> Raj...
>
> Ok, one of the things we have at one of my clients is the hadoop user's
> account is actually a centralized account. (User's accounts are mounted as
> they log in to the machine.)
> So you have a single account hadoop for all of the machines.
>
> So when you set up the keys, they are in the ~hadoop account.
>
> So you have a bit of work w 512 nodes, and yeah, its painful for the first
> time.
>
> Like I said, I don't have a cloud of 512 nodes, and when I am building the
> cloud of 20+ machines, setting up ssh is just part of the process.
>
> If you set up hadoop as a system service, then does that mean when you boot
> the machine, your node goes up on its own like other services?
> I personally don't think that's a good idea...
>
> I haven't evaluated puppet, I'm pulled yet again in to other things....
>
> So I don't have an answer.
>
> My point was that you ca go through and add the user/password keys as part
> of the build process and while painful, its not that painful. (Trust me,
> there's worse things that can get dropped on your desk. ;-)
>
> -Mike
>
>
> > Date: Tue, 10 Aug 2010 13:06:51 -0700
> > From: rajvish@yahoo.com
> > Subject: Re: Best practices - Large Hadoop Cluster
> > To: common-user@hadoop.apache.org
> >
> > Mike
> > 512 nodes, even a minute for each node ( ssh-ing to each node, typing a 8
> > character password, ensuring that everything looks ok) is about 8.5
> hours. After
> > that if something does not work, that is a different level of pain
> altogether.
> >
> > Using scp to exchange keys simply does not scale.
> >
> > My question was simple, how do other people in the group who run large
> clusters
> > manage this?  Brian put it better; Whats is the best, duplicatable  way
> of
> > running hadoop  when the cluster is large. I agree, this is not a hadoop
> > question per se, but hadoop is really what I care about now.
> >
> > Thanks to others for useful suggestions. I will examine them and post a
> summary
> > if anyone is interested.
> >
> > Raj
> >
> >
> >
> >
> >
> > ________________________________
> > From: Michael Segel <michael_segel@hotmail.com>
> > To: common-user@hadoop.apache.org
> > Sent: Tue, August 10, 2010 11:36:14 AM
> > Subject: RE: Best practices - Large Hadoop Cluster
> >
> >
> > I'm a little confused by Raj's problem.
> >
> > If you follow the instructions outlined in the Hadoop books and
> everywhere else
> > about setting up ssh keys, you shouldn't have a problem.
> > I'd just ssh as the hadoop user to each of the nodes before trying to
> start
> > hadoop for the first time.
> >
> > At 512 nodes, I think you may run in to other issues... (I don't know, I
> don't
> > have 512 machines to play with :-(  ) And puppet has been recommended a
> couple
> > of times.
> >
> > Just my $0.02
> >
> > -Mike
> >
> >
> > > Date: Tue, 10 Aug 2010 23:43:12 +0530
> > > From: gokulm@huawei.com
> > > Subject: RE: Best practices - Large Hadoop Cluster
> > > To: common-user@hadoop.apache.org
> > >
> > >
> > > Hi Raj,
> > >
> > >     As per my understanding the problem is with ssh password each time
> > > you start/stop the cluster. You need password less startup shutdown
> right.?
> > >
> > >     Here is my way of overcoming the ssh problem
> > >
> > >     Write a shell script as follows:
> > >
> > >     1. Generate a ssh key from the namenode machine (where you will
> > > start/stop the cluster)
> > >
> > >     2. Read each entry from the conf/slaves file and do the following
> > >
> > >         2.1 add the key you generated in step 1 to the ssh
> > > authorized_keys file of the datanode machine that you got in step 2
> > > something like below script
> > >             cat $HOME/.ssh/public_key_file | ssh username@host '
> > > cat >> $HOME/.ssh/authorized_keys'
> > >
> > >
> > >     3. Repeat step 2 for conf/masters also
> > >
> > >     Note: Password must be specified for the specified username@host
> > > first time since the ssh command given in point 2.1 requires it.
> > >
> > >     Now you can start/stop your hadoop cluster without ssh password
> > > overhead
> > >
> > >
> > >  Thanks,
> > >   Gokul
> > >
> > >
> > >
> > >
> ****************************************************************************
> > > ***********
> > >
> > > -----Original Message-----
> > > From: Raj V [mailto:rajvish@yahoo.com]
> > > Sent: Tuesday, August 10, 2010 7:16 PM
> > > To: common-user@hadoop.apache.org
> > > Subject: Best practices - Large Hadoop Cluster
> > >
> > > I need to start setting up a large - hadoop cluster of 512 nodes . My
> > > biggest
> > > problem is the SSH keys. Is there a simpler way of generating and
> exchanging
> > > ssh
> > > keys among the nodes? Any best practices? If there is none, I could
> > > volunteer to
> > > do it,
> > >
> > > Raj
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message