hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1159) Reducers hang when map output file has a checksum error
Date Fri, 30 Mar 2007 06:21:25 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485427

Devaraj Das commented on HADOOP-1159:

We saw the NPE coming from the call to mapOutputIn.read( ) in the MapOutputServlet.doGet method
in TaskTracker.java. Hairong said that HADOOP-1123 should fix the NPE problem in the read
method, but am not sure since it is not possible to consistently reproduce this problem. But
if the NPE has really been fixed in the read method, I think we don't have to touch the doGet
method (since ideally all exceptions to do with read/write should come to us as IOExceptions,
which we already handle).

> Reducers hang when map output file has a checksum error
> -------------------------------------------------------
>                 Key: HADOOP-1159
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1159
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Nigel Daley
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.3
>         Attachments: 1159-merge.patch, 1159.patch, h1159-2.patch, h1159.patch
> Two reduces hung in our sort benchmark. They always fail to get map outputs from node
X due to checksum error when the map outputs are read at that node resulting in a NullPointerException
on node X. This leads to constant failures on the two fetching reduces.
> 2007-03-26 00:02:57,082 WARN org.apache.hadoop.fs.FileSystem: Moving bad file /e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.out
to /e/c/bad_files/file.out.542279301
> 2007-03-26 00:02:57,083 INFO org.apache.hadoop.fs.FSInputChecker: Found checksum error:
org.apache.hadoop.fs.ChecksumException: Checksum error: /e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.out
at 106484224
> 	at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.verifySum(ChecksumFileSystem.java:254)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:211)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167)
> 	at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 	at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> 	at java.io.DataInputStream.read(DataInputStream.java:132)
> 	at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1659)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> 	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
> 	at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
> 	at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
> 	at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
> 	at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
> 	at org.mortbay.http.HttpServer.service(HttpServer.java:954)
> 	at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
> 	at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
> 	at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
> 	at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
> 	at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
> 	at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
> 2007-03-26 00:02:57,083 WARN /: /mapOutput?map=task_0002_m_022488_0&reduce=1542:

> java.lang.NullPointerException

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message