hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vjeran Marcinko" <vjeran.marci...@email.t-com.hr>
Subject Best Hadoop dev environment [WAS: RE: Few noob MR questions]
Date Sun, 14 Apr 2013 05:18:35 GMT
Hi again,


You actually touched what I'm trying to do here - setup best Hadooop
development environment. 


Moreoever, don't ask me why, my development machine is on Windows, so I
don't have my Hadoop on it, so I use linux virtual machine with Hadoop
running in it, so I would like mostly to develop my job code in my favourite
IDE, and just deploy my jobs from there, and let them see running in this
"remote" virtual Hadoop platform. Although build scripts can help a lot, so
each time I change some job code, using these scripts I could package it and
transfer to Hadoop machine where I can deploy it via "hadoop jar." command,
and I will certainly do that *in production*, but *in development
environment* I would like to avoid that, And when in IDE, when I say "Run",
it uses "java -classpath .", not even "java -jar .", so job class is not
found in some packaged form. (at least by default - any proper IDE can add
additional build steps to it),


So are there any more hints for me to setup this environment?


Hadoop can really be intimidating for newbvie - there so much versions out
there, so many examples using different APIs, and so many ways to deploy a
job for eg, that I don't know how to start. And my windows OS brings even
more problems in the beginning, when I don't know much.





From: Bjorn Jonsson [mailto:bjornjon@gmail.com] 
Sent: Sunday, April 14, 2013 5:27 AM
To: user@hadoop.apache.org
Subject: Re: Few noob MR questions


Correct, you can use java -jar to submit a job...with the "driver" code in a
plain static main method. I do it all the time. You can of course run a Job
straight from your IDE Java code also. You can check out the .runJar()
method in the Hadoop API Javadoc to see what the hadoop command does
essentially I think. 





On Sat, Apr 13, 2013 at 3:59 PM, Jens Scheidtmann
<jens.scheidtmann@gmail.com> wrote:

Dear Vjeran,

your own jobs should implement the Tool Interface and ToolRunner. This gives
additional standard options on the command line. 

Also have a look at class ProgramDriver as used here:

which further simplifies executing your MR jobs.


Best regards,



View raw message