hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praveen Yarlagadda <praveen.yarlaga...@gmail.com>
Subject Re: basic hadoop job help
Date Thu, 18 Feb 2010 23:40:41 GMT
I recommend the following book by Tom White:

Hadoop: The Definite Guide

It will give you more details about Hadoop.

Regards,
Praveen

On Thu, Feb 18, 2010 at 11:34 AM, Brian Wolf <brw314@gmail.com> wrote:

> since i'm more or less in the same boat, this is the best I've seen, and
> the
> 2009
>
>
> book is also very good:
>
>
> http://developer.yahoo.com/hadoop/
>
> Brian
>
> On Thu, Feb 18, 2010 at 12:26 PM, Amogh Vasekar <amogh@yahoo-inc.com>
> wrote:
>
> > Hi,
> > The hadoop meet last year has some very interesting business solutions
> > discussed:
> > http://www.cloudera.com/company/press-center/hadoop-world-nyc/
> > Most of the companies in there have shared their methodology on their
> blogs
> > / on slideshare.
> > One I have handy is:
> >
> >
> http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig
> > Shows how Y! Search assist is implemented.
> >
> >
> > Amogh
> >
> >
> > On 2/19/10 12:48 AM, "C Berg" <icey502@yahoo.com> wrote:
> >
> > Hi Eric,
> >
> > Thanks for the advice, that is very much appreciated.  With your help I
> was
> > able to get past the mechanical part to something a bit more substantive,
> > which is, wrapping my head around doing an actual business calculation in
> a
> > mapreduce way.  Any recommendations on some tutorials that cover some
> > real-world examples other than word counting and the like?
> >
> > Thanks again,
> >
> > Cory
> >
> > --- On Thu, 2/18/10, Eric Arenas <earenas@rocketmail.com> wrote:
> >
> > > From: Eric Arenas <earenas@rocketmail.com>
> > > Subject: Re: basic hadoop job help
> > > To: common-user@hadoop.apache.org
> > > Date: Thursday, February 18, 2010, 10:52 AM
> > > Hi Cory,
> > >
> > > regarding the part that you are not sure about:
> > >
> > >
> > > String inputdir  = args[0];
> > > String outputdir= args[1];
> > > int numberReducers = Integer.parseInt(args[2]);
> > > //it is better to at least pass the numbers of reducers as
> > > parameters, or read from the XML job config file, if you
> > > want
> > >
> > > //setting the number of reducers to 1 , as you had in your
> > > code *might* potentially make it slower to process and
> > > generate the output
> > > //if you are trying to sell the idea of Hadoop as a new ETL
> > > tool, you want it to be as fast as you can
> > >
> > > ...................
> > >
> > > job2.setNumReduceTasks(1);
> > > FileInputFormat.setInputPaths(job, inputdir);
> > > FileOutputFormat.setOutputPath(job, new Path(outputdir));
> > >
> > > return job.waitForCompletion(true) ? 0 : 1;
> > >
> > >   } //end of run method
> > >
> > >
> > > Unless you copy/paste your code, I do not see why you need
> > > to set "setWorkingDirectory" in your M/R job.
> > >
> > > Give this a try and let me know,
> > >
> > > regards,
> > > Eric Arenas
> > >
> > >
> > >
> > > ----- Original Message ----
> > > From: Cory Berg <icey502@yahoo.com>
> > > To: common-user@hadoop.apache.org
> > > Sent: Thu, February 18, 2010 9:07:54 AM
> > > Subject: basic hadoop job help
> > >
> > > Hey all,
> > >
> > > I'm trying to get Hadoop up and running as a proof of
> > > concept to make an argument for moving away from a big
> > > RDBMS.  I'm having some challenges just getting a
> > > really simple demo mapreduce to run.  The examples I
> > > have seen on the web tend to make use of classes that are
> > > now deprecated in the latest hadoop (0.20.1).  It is
> > > not clear what the equivalent newer classes are in some
> > > cases.
> > >
> > > Anyway, I am stuck at this exception - here it is start to
> > > finish:
> > > ---------------
> > > $ ./bin/hadoop jar ./testdata/RetailTest.jar RetailTest
> > > testdata outputdata
> > > 10/02/18 09:24:55 INFO jvm.JvmMetrics: Initializing JVM
> > > Metrics with processName
> > > =JobTracker, sessionId=
> > > 10/02/18 09:24:55 WARN mapred.JobClient: Use
> > > GenericOptionsParser for parsing th
> > > e arguments. Applications should implement Tool for the
> > > same.
> > > 10/02/18 09:24:55 INFO input.FileInputFormat: Total input
> > > paths to process : 5
> > > 10/02/18 09:24:56 INFO input.FileInputFormat: Total input
> > > paths to process : 5
> > > Exception in thread "Thread-13"
> > > java.lang.IllegalStateException: Shutdown in pro
> > > gress
> > >         at
> > > java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:
> > > 39)
> > >         at
> > > java.lang.Runtime.addShutdownHook(Runtime.java:192)
> > >         at
> > > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1387)
> > >         at
> > > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> > >         at
> > > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
> > >         at
> > > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
> > >         at
> > > org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
> > >         at
> > > org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCom
> > > mitter.java:61)
> > >         at
> > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2
> > > 45)
> > > ------------
> > >
> > > Now here is the code that actually starts things up (not
> > > including the actual mapreduce code).  I initially
> > > suspected this code because I was guessing at the correct
> > > non-deprecated classes to use:
> > >
> > >   public int run(String[] args) throws Exception {
> > >         Configuration conf = new
> > > Configuration();
> > >         Job job2 = new Job(conf);
> > >         job2.setJobName("RetailTest");
> > >
> > > job2.setJarByClass(RetailTest.class);
> > >
> > > job2.setMapperClass(RetailMapper.class);
> > >
> > > job2.setReducerClass(RetailReducer.class);
> > >
> > > job2.setOutputKeyClass(Text.class);
> > >
> > > job2.setOutputValueClass(Text.class);
> > >         job2.setNumReduceTasks(1);
> > > // this was a guess on my part as I could not find out the
> > > "recommended way"
> > >         job2.setWorkingDirectory(new
> > > Path(args[0]));
> > >
> > > FileInputFormat.setInputPaths(job2, new Path(args[0]));
> > >
> > > FileOutputFormat.setOutputPath(job2, new Path(args[1]));
> > >         job2.submit();
> > >         return 0;
> > >       }
> > >
> > >       /**
> > >        * @param args
> > >        */
> > >       public static void main(String[] args)
> > > throws Exception {
> > >         int res = ToolRunner.run(new
> > > RetailTest(), args);
> > >         System.exit(res);
> > >       }
> > >
> > > Can someone sanity check me here?  Much appreciated.
> > >
> > > Regards,
> > >
> > > Cory
> > >
> >
> >
> >
> >
> >
>



-- 
Regards,
Praveen

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message