Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 76594 invoked from network); 29 Mar 2007 18:33:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Mar 2007 18:33:47 -0000 Received: (qmail 40289 invoked by uid 500); 29 Mar 2007 18:33:54 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 40102 invoked by uid 500); 29 Mar 2007 18:33:53 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 40093 invoked by uid 99); 29 Mar 2007 18:33:53 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2007 11:33:53 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2007 11:33:45 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 69B3F714068 for ; Thu, 29 Mar 2007 11:33:25 -0700 (PDT) Message-ID: <32402043.1175193205429.JavaMail.jira@brutus> Date: Thu, 29 Mar 2007 11:33:25 -0700 (PDT) From: "Devaraj Das (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-1159) Reducers hang when map output file has a checksum error In-Reply-To: <25245855.1174926932140.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HADOOP-1159: -------------------------------- Attachment: 1159-merge.patch This patch merges the two patches (1159.patch and h1159-2.patch). > Reducers hang when map output file has a checksum error > ------------------------------------------------------- > > Key: HADOOP-1159 > URL: https://issues.apache.org/jira/browse/HADOOP-1159 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.12.2 > Reporter: Nigel Daley > Assigned To: Owen O'Malley > Fix For: 0.12.3 > > Attachments: 1159-merge.patch, 1159.patch, h1159-2.patch, h1159.patch > > > Two reduces hung in our sort benchmark. They always fail to get map outputs from node X due to checksum error when the map outputs are read at that node resulting in a NullPointerException on node X. This leads to constant failures on the two fetching reduces. > 2007-03-26 00:02:57,082 WARN org.apache.hadoop.fs.FileSystem: Moving bad file /e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.out to /e/c/bad_files/file.out.542279301 > 2007-03-26 00:02:57,083 INFO org.apache.hadoop.fs.FSInputChecker: Found checksum error: org.apache.hadoop.fs.ChecksumException: Checksum error: /e/c/k/hqa/tb/tmp/mapred/local2/task_0002_m_022488_0/file.out at 106484224 > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.verifySum(ChecksumFileSystem.java:254) > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:211) > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167) > at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at java.io.DataInputStream.read(DataInputStream.java:132) > at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1659) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) > at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) > at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) > at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) > at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) > at org.mortbay.http.HttpServer.service(HttpServer.java:954) > at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) > at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) > at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) > at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) > at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) > at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) > 2007-03-26 00:02:57,083 WARN /: /mapOutput?map=task_0002_m_022488_0&reduce=1542: > java.lang.NullPointerException -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.