hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: Gathering connection information
Date Sat, 14 Jun 2014 12:14:13 GMT
Thanks, that’s interesting information.  Use of an Edge Node sounds like a useful convention.
 We are software vendors, and we want to connect to any Hadoop cluster regardless of configuration.
 How does the Edge Node support connections to HDFS from the client?  Doesn’t the HDFS FileSystem
require direct connections to each DataNode?  Does such an Edge Node proxy all of those connections
automatically, or does our software need to be made aware of this convention somehow?


From: Rishi Yadav [mailto:rishi@infoobjects.com]
Sent: Saturday, June 07, 2014 8:20 AM
To: user@hadoop.apache.org
Subject: Re: Gathering connection information

Typically users ssh edge node which is co-located with the cluster. It also minimizes latency
between client and cluster.

Sent from Mailbox<https://www.dropbox.com/mailbox>

On Sat, Jun 7, 2014 at 7:12 AM, Peyman Mohajerian <mohajeri@gmail.com<mailto:mohajeri@gmail.com>>
In my experience you build a node called Edge Node which has all the libraries and configuration
setting in XML to connect to the cluster, it just doesn't have any of the Hadoop daemons running.

On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <john.lilley@redpoint.net<mailto:john.lilley@redpoint.net>>
We’ve found that much of the Hadoop samples assume that running is being done form a cluster
node, and that the connection information can be gleaned directly from a configuration object.
 However, we always run our client from a remote computer, and our users must manually specify
the NN/RM addresses and ports.  We’ve found this varies maddeningly between distros and
especially on hosted virtual implementations.  Getting the wrong port results in various inscrutable
errors with red-herring messages about security.  Is there a prescribed way to get the correct
connection information more easily, like from a web API (where at least we’d only need one
address and port)?


View raw message