hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Custom input help/debug help
Date Thu, 09 Jul 2009 22:09:49 GMT
Hi Matthew,

You can set the heap size for child jobs by calling
conf.set("mapred.child.java.opts", "-Xmx1024m") to get a gig of heap space.
That should fix the OOM issue in IsolationRunner. You can also change the
heap size used in Eclipse; if you go to Debug Configurations, create a new
configuration or modify your existing one. Under the Arguments tab, there's
a box for "VM arguments". Pass the same one as above (-Xmx1024m) to get a GB
heap. Then "Debug As" this new configuration.

As for API differences: writing a new InputFormat / RecordReader isn't
terribly different from writing an old-style one. Methods have moved, but
the gist is still the same. My biggest tip for you would be to instantiate a
log4j Log object in each class (see e.g., line 60 of
org/apache/hadoop/mapred/FileInputFormat). Then call LOG.info(...) at
various places in your IF and RR to track what's happening. Are your classes
being instantiated? Are you opening the right files? This will give you a
start on debugging.

- Aaron

On Wed, Jul 8, 2009 at 3:22 PM, Matthew B. <mbanick1@vt.edu> wrote:

> Hello,
> I'm starting to use Hadoop for something I'm working on.  I'm on a windows
> machine (xp) and I cannot consider changing to any other OS.  I'm using
> eclipse with the hadoop plug in to develop, and I have cygwin fully
> installed and working; I am using hadoop-0.20.0.  I tried to develop a
> class
> that would take a file or multiple files in my input directory, and read
> them in using XStream.  The input converts the class read from the file
> back
> to an xml string which is stored in a Text object for processing.  The
> reason I am doing this is because my data is stored as XML in a .txt file
> after I read them in.  I cannot use the default reader classes as when
> XStream writes the object, the fields are seperated by newlines, and since
> the default reader reads a line of text and submits to the mapper, this
> wont
> work for me.
> After coding my own reader class, my problem is now that the map-reduce
> process seems to do nothing.  It runs and says it was complete, but it did
> not process anything.   I've tried debugging as a single process in eclipse
> by doing config.set("mapred.job.tracker", "local"); in my main method.
> However, when I run the program that way (either normally or in debug mode)
> from eclipse, I always get an out of memory exception.  The configuration
> has pre-set my child max heap size to 200.  I've also tried to make a class
> with a main method that runs IsolationRunner on a job.xml file.  However, I
> get the same problem (out of memory).  Can someone give me a 'dumbed down'
> way of using the eclipse debugger with my map-reduce code?
> Finally, if anyone could point me in the right direction on any resources
> that explain how to code a custom input class, that'd be great.  I was
> referencing Yahoo's tutorial, however they use deprecated methods, and I
> have been using the new methods.
> Thanks.
> PS.  I am very very new at this, so please excuse me if my post was unclear
> or missing key information.
> --
> View this message in context:
> http://www.nabble.com/Custom-input-help-debug-help-tp24400447p24400447.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message