hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1159) Reducers hang when map output file has a checksum error
Date Thu, 29 Mar 2007 20:57:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485347
] 

Devaraj Das commented on HADOOP-1159:
-------------------------------------

Well, the NPE is caught and logged. Since the doGet method doesn't explicitly document throwing
NPE (as it does for IOException), it is not rethrown; the task is declared lost (just as in
the IOException case). For debugging purposes the logs can be dug.
Regarding the client part, it doesn't harm if the NPE is caught and logged. It potentially
improves debugging.

> Reducers hang when map output file has a checksum error
> -------------------------------------------------------
>
>                 Key: HADOOP-1159
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1159
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Nigel Daley
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.3
>
>         Attachments: 1159-merge.patch, 1159.patch, h1159-2.patch, h1159.patch
>
>
> Two reduces hung in our sort benchmark. They always fail to get map outputs from node
X due to checksum error when the map outputs are read at that node resulting in a NullPointerException
on node X. This leads to constant failures on the two fetching reduces.
> 2007-03-26 00:02:57,082 WARN org.apache.hadoop.fs.FileSystem: Moving bad file /e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.out
to /e/c/bad_files/file.out.542279301
> 2007-03-26 00:02:57,083 INFO org.apache.hadoop.fs.FSInputChecker: Found checksum error:
org.apache.hadoop.fs.ChecksumException: Checksum error: /e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.out
at 106484224
> 	at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.verifySum(ChecksumFileSystem.java:254)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:211)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167)
> 	at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 	at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> 	at java.io.DataInputStream.read(DataInputStream.java:132)
> 	at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1659)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> 	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
> 	at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
> 	at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
> 	at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
> 	at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
> 	at org.mortbay.http.HttpServer.service(HttpServer.java:954)
> 	at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
> 	at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
> 	at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
> 	at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
> 	at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
> 	at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
> 2007-03-26 00:02:57,083 WARN /: /mapOutput?map=task_0002_m_022488_0&reduce=1542:

> java.lang.NullPointerException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message