hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: running hadoop remotely from inside a java program
Date Wed, 09 Jul 2008 09:27:16 GMT
Deyaa Adranale wrote:
> 
> thanks for your help
> 
> please i need more explanations on these:
> 
> * it is not too far away, network-wise
> what do u mean network-wise?? what are the requirements of the 
> connection between the client and server? because i think that my 
> cluster is protected with a firewall

I dont know if that will work or not. Given what hadoop security is like 
(minimal), a firewall between the cluster and rest of the world is 
important.

> 
> * the client hadoop configuration is in sync with the servers
> how to do this?
> i have been till now only running jobs on hadoop, but i have never 
> configured it.
> this will not mean that the client machine will be a node in the 
> cluster, right?

The client machines XML files need to be synchronised with those on the 
server, otherwise

> 
> and what if my client does not have a hadoop installation and I don't 
> want to force him to install one just to use my tool? can't I simply 
> submit jobs to the cluster remotely from my java code using SSH for 
> example?

That's one option:
* scp the JARs to a machine in the cluster
* ssh in to that machine
* use the command line tools there to run the job
* use distcp to get the results back to the local fileysystem, and scp 
them back to the user

Its workable, and would avoid having to to have the hadoop client side 
jars installed on every machine. Getting the data in and  out 
efficiently could be the tricky part.

-steve


Mime
View raw message