hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Hadoop on windows with bat and ant scripts
Date Mon, 13 Jun 2011 12:50:25 GMT
On 06/10/2011 03:23 PM, Bible, Landy wrote:
> Hi Raja,
> I'm currently running HDFS on Windows 7 desktops.  I had to create a hadoop.bat that
provided the same functionality of the shell scripts, and some Java Service Wrapper configs
to run the DataNodes and NameNode as windows services.  Once I get my system more functional
I plan to do a write up about how I did it, but it wasn't too difficult.  I'd also like to
see Hadoop become less platform dependent.

why? Do you plan to bring up a real Windows server datacenter to test it 

>Java is supposed to be Write Once - Run Anywhere, but a lot of java projects seem to forget

Java can be x-platform, but you have to consider the problems of testing 
on hundreds of machines, the fact that even System.execute() behaves 
differently on different systems, the networking setup and behaviour of 
windows is very different from Unix, etc.

Whether you like it or not, all the big Hadoop clusters run on Linux, 
not just for the licensing costs, but because it is what Hadoop is 
tested on at those scales, so it becomes self-reinforcing. Same for the 
JVM: Sun's standard JVM, not JRockit or anything else. Again, in a large 
datacenter you will find all the corner cases where that "runs anywhere" 
claim changes to "crashes one task tracker every hour".

OS/X and Windows support is very much there for development, though even 
there I'd recommend switching to a Linux laptop to reduce the surprises 
when you go to the real cluster. Allen W will note that Solaris works 
too, but even then differences between Linux and SunOS caused problems.

By having a de-facto agreement to focus on Linux as the back end, it 
lets the developers
* have a single platform to dev and test on
* worry about RPM and deb installers, not windows install/uninstall quirks.
* share ready-to-use Linux VM images (as Cloudera do) for people to play 
* use the large cluster management tooling that exists for managing big 
Linux clusters (Kickstart, etc).

I think it's important is for the client-side code to work on windows, 
for job submission to be x-platform, but getting server-side code to 
work well on windows is a lot harder than people expect. The OS wasn't 
really written for it, the Java Service Wrappers have their own issues 
(both the Apache one, which is derived from Tomcat, and the other one), 
and it's not something I'd recommend to go near unless you really have 
no choice in the matter. I speak from experience.


>   So far, I've been unable to make MapReduce work correctly.  The services run, but things
don't work, however I suspect that this is due to DNS not working correctly in my environment.

yes, that's part of the anywhere you have to fix. Edit the host tables 
so that DNS and reverse DNS appears to work. That's 
c:\windows\system32\drivers\etc\hosts, unless on a win64 box it moves.

View raw message