hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn-Elmar Macek <ma...@cs.uni-kassel.de>
Subject mortbay, huge files and the ulimit
Date Wed, 29 Aug 2012 13:53:32 GMT
Hi there,

i am currently running a job where i selfjoin a 63 gigabyte big csv file 
on 20 physically distinct nodes with 15GB each:

While the mapping works just fine and is low cost, the reducer does the 
main work: holding a hashmap with elements to join with and finding join 
tuples for evry incoming key-value-pair.

The jobs works perfectly on small files with 2 gigabytes, but starts to 
get "unstable" as the file size goes up: this becomes evident with a 
look into the tasktracker's logs saying:

ERROR org.mortbay.log: /mapOutput
java.lang.IllegalStateException: Committed
     at org.mortbay.jetty.Response.resetBuffer(Response.java:1023)
     at org.mortbay.jetty.Response.sendError(Response.java:240)
     at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3945)
     at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
     at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
     at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
     at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
     at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
     at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
     at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
     at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
     at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
     at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
     at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
     at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
     at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
     at org.mortbay.jetty.Server.handle(Server.java:326)
     at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
     at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
     at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
     at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
     at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
     at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
     at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


And while it is no problem at the beginning of the reduce process, where 
this happens only on a few nodes and rarely, it becomes crucial as the 
progress rises. The reason for this (afaik from reading articles), is 
that there are memory or file handle problems. I addressed the memory 
problem by conitiously purging the map of outdated elements evry 5 
million processed key-value-pairs. And i set  mapred.child.ulimit to 
100000000 (ulimit in the shell tells me it is 400000000).

Anyway i am still running into those mortbay errors and i start to 
wonder, if hadoop can manage the job with this algorithmn anyways. By 
pure naive math it should be:
i explicily assigned 10GB memory to each JVM on each node and set 
mapred.child.java.opts to "-Xmx10240m -XX:+UseCompressedOops 
-XX:-UseGCOverheadLimit" (its a 64 bit environment and large 
datastructures cause the GC to throw exceptions). This would naively 
make 18 slave machines with 10GB each resulting in an overall memory of 
180GB - three times as much as needed... i would think. So if the 
Partitioner distributes them just about equally to all nodes i should 
not run into any errors, do i?

Can anybody help me with this issue?

Best regards,
Elmar


Mime
View raw message