nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <lists.digitalpeb...@gmail.com>
Subject Re: Spill failed
Date Wed, 10 Feb 2010 12:51:53 GMT
meant HADOOP_HEAPSIZE : 1 for the datanode daemon + 1 for the tasktracker
daemon

On 10 February 2010 12:30, Santiago Pérez <elaragon@gmail.com> wrote:

>
> Thanks a lot! I will try to find thebook you recommend
>
> Why (NUTCH_HEAPSIZE * 2)
> do you mean (NUTCH_HEAPSIZE + HADOOP_HEAPSIZE)?
>
>
>
> Julien Nioche-4 wrote:
> >
> >> Ok, thanks I will set the value to a lower number. BTW, which is the
> >> relationship between HADOOP_HEAPSIZE and NUTCH_HEAPSIZE?
> >
> > HADOOP_HEAPSIZE is used by the Hadoop daemons (job+taskTracker etc...)
> > whereas the NUTCH_HEAPSIZE is used by the Nutch commands
> > (inject|fetch|...)
> > However when using a distributed configuration the main element affecting
> > the memory on the slaves is the standard Hadoop param *
> > mapred.child.java.opts*
> > **since you are using Nutch in distributed mode  NUTCH_HEAPSIZE is used
> > only
> > for the driver classes but won't affect the slaves (IIRC) and should not
> > have much of an impact on the memory consumption.
> >
> >>
> >> Should I set a value for NUTCH_HEAPSIZE for
> >> HADOOP_HEAPSIZE+NUTCH_HEAPSIZE<TOTAL RAM? or NUTCH_HEAPSIZE depends on
> >> HADOOP_HEAPSIZE?
> >
> > again the problem will occurr on the slaves, not on the master. slaves
> > have
> > a datanode + a tasktracker each using up to NUTCH_HEAPSIZE. so roughly
> the
> > equation should be
> > (NUTCH_HEAPSIZE * 2) + (RAM OS) + (mapred.child.java.opts * num tasks)
> > < TOTAL RAM
> >
> > If you haven't do so, I'd recommend that you read Tom White's book on
> > Hadoop. What you've described is actually not an issue related to Nutch
> > itself
> >
> > HTH
> >
> > Julien
> > --
> > DigitalPebble Ltd
> > http://www.digitalpebble.com
> >
> >>
> >> I was looking for this info but I did not found anything clear
> >>
> >> Thanks :)
> >>
> >> PS. I will post in Nutch-user for next doubts of this level
> >>
> >>
> >>
> >> Julien Nioche-4 wrote:
> >>>
> >>> the explanation can be found in the stack trace you sent :
> >>> "java.io.IOException: error=12, Cannot allocate memory"
> >>>
> >>> Small instances on EC2 does not give you enough memory. from the
> >>> configuration below the slaves will use up to 1300M for the datanode
> >>> and  tasktracker; if you add to that the memory used by the OS and of
> >>> course the tasks themselves it is not surprising that you used the
> >>> 1.7G you had. Things get worse if you parse at the same time as you
> >>> fetch as this tend to take some RAM.
> >>>
> >>> From my experience EC2 large instances are more appropriate for a Nutch
> >>> cluster
> >>>
> >>> PS: nutch-user would be a more appropriate list for this type of
> >>> messages
> >>>
> >>> J.
> >>>
> >>> --
> >>> DigitalPebble Ltd
> >>> http://www.digitalpebble.com
> >>>
> >>>
> >>>
> >>> On 10 February 2010 08:41, Santiago Pérez <elaragon@gmail.com> wrote:
> >>>>
> >>>> Hej
> >>>>
> >>>> I am running Nutch in a cluster with 1 master and 6 slaves in Amazon
> >>>> (with
> >>>> the same instances for all of them with 1.7GB RAM memory)
> >>>>
> >>>> My configuration is the following:
> >>>>
> >>>> HADOOP_HEAPSIZE=1300
> >>>> HADOOP_NAMENODE_OPTS=-Xmx400m
> >>>> HADOOP_SECONDARYNAMENODE_OPTS=-Xmx400m
> >>>> HADOOP_JOBTRACKER_OPTS=-Xmx400m
> >>>> dfs.replication=3
> >>>> mapred.map.tasks=6
> >>>> mapred.reduce.tasks=6
> >>>> mapred.child.java.opts=-Xmx950m
> >>>>
> >>>> But in the second depth fetch, I got the following errors in some
> >>>> instances
> >>>> (while the other ones seems they fetched correctly) :
> >>>>
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - java.io.IOException:
> >>>> Spill
> >>>> failed
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>>
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:822)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>>
> >
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>>
> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:907)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:670)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - Caused by:
> >>>> java.io.IOException: Cannot run program "bash": java.io.IOException:
> >>>> error=12, Cannot allocate memory
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>> java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>> org.apache.hadoop.util.Shell.run(Shell.java:134)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>> org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>>
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>>
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>>
> >
> org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>>
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1183)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>>
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>>
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1135)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - Caused by:
> >>>> java.io.IOException: java.io.IOException: error=12, Cannot allocate
> >>>> memory
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>> java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>> java.lang.ProcessImpl.start(ProcessImpl.java:65)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - at
> >>>> java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
> >>>> 2010-02-10 03:18:31,185 FATAL fetcher.Fetcher - ... 9 more
> >>>> .
> >>>> .
> >>>> .
> >>>> .
> >>>> .
> >>>> 2010-02-10 03:18:31,463 WARN  mapred.TaskTracker - Error running child
> >>>> java.io.IOException: Spill failed
> >>>>        at
> >>>>
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1085)
> >>>>        at
> > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359)
> >>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >>>> Caused by: java.io.IOException: Cannot run program "bash":
> >>>> java.io.IOException: error=12, Cannot allocate memory
> >>>>        at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
> >>>>        at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
> >>>>        at org.apache.hadoop.util.Shell.run(Shell.java:134)
> >>>>        at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
> >>>>        at
> >>>>
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
> >>>>        at
> >>>>
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> >>>>        at
> >>>>
> >
> org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
> >>>>        at
> >>>>
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1183)
> >>>>        at
> >>>>
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648)
> >>>>        at
> >>>>
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1135)
> >>>> Caused by: java.io.IOException: java.io.IOException: error=12, Cannot
> >>>> allocate memory
> >>>>        at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
> >>>>        at java.lang.ProcessImpl.start(ProcessImpl.java:65)
> >>>>        at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
> >>>>        ... 9 more
> >>>>
> >>>> Any idea??
> >>>>
> >>>> Thanks in advance :)
> >>>> --
> >>>> View this message in context:
> >>>> http://old.nabble.com/Spill-failed-tp27527090p27527090.html
> >>>> Sent from the Nutch - Dev mailing list archive at Nabble.com.
> >>>>
> >>>>
> >>>
> >>>
> >>
> >> --
> >> View this message in context:
> > http://old.nabble.com/Spill-failed-tp27527090p27529222.html
> >> Sent from the Nutch - Dev mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Re%3A-Spill-failed-tp27530238p27530459.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>


-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message