Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 781E718F09 for ; Fri, 24 Apr 2015 11:35:43 +0000 (UTC) Received: (qmail 50552 invoked by uid 500); 24 Apr 2015 11:35:43 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 50495 invoked by uid 500); 24 Apr 2015 11:35:43 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 50479 invoked by uid 99); 24 Apr 2015 11:35:43 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Apr 2015 11:35:43 +0000 Date: Fri, 24 Apr 2015 11:35:43 +0000 (UTC) From: "Hudson (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510841#comment-14510841 ] Hudson commented on HDFS-8070: ------------------------------ FAILURE: Integrated in Hadoop-Hdfs-trunk #2105 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2105/]) HDFS-8070. Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode (cmccabe) (cmccabe: rev a8898445dc9b5cdb7230e2e23a57393c9f378ff0) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode > --------------------------------------------------------------------------- > > Key: HDFS-8070 > URL: https://issues.apache.org/jira/browse/HDFS-8070 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching > Affects Versions: 2.7.0 > Reporter: Gopal V > Assignee: Colin Patrick McCabe > Priority: Blocker > Fix For: 2.7.1 > > Attachments: HDFS-8070.001.patch > > > HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded split-generation. > I hit this immediately after I upgraded the data, so I wonder if the ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 Client? > {code} > 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) > expr = (not leaf-0) > 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=2, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. > java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 > at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) > expr = (not leaf-0) > 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got IOException calling shutdown(SHUT_RDWR) > java.nio.channels.ClosedChannelException > at org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57) > at org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387) > at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378) > at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk) > expr = (not leaf-0) > 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=4, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. > java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 > at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Looks like a double free-fd condition? > {code} > 2015-04-02 18:58:47,653 [DataXceiver for client unix:/grid/0/cluster/hdfs/dn_socket [Passing file descriptors for block BP-942051088-172.18.0.41-1370508013893:blk_1076973408_1099515627985]] INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unregistering SlotId(3bd7fd9aed791e95acfb5034e6617d83:0) because the requestShortCircuitFdsForRead operation failed. > 2015-04-02 18:58:47,653 [DataXceiver for client unix:/grid/0/cluster/hdfs/dn_socket [Passing file descriptors for block BP-942051088--1370508013893:blk_1076973408_1099515627985]] INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1076973408, srvID: ba7b6f19-47e0-4b86-af50-23981649318c, success: false > 2015-04-02 18:58:47,654 [DataXceiver for client unix:/grid/0/cluster/hdfs/dn_socket [Passing file descriptors for block BP-942051088-172.18.0.41-1370508013893:blk_1076973408_1099515627985]] ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: cn060-10.l42scl.hortonworks.com:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_FDS operation src: unix:/grid/0/cluster/hdfs/dn_socket dst: > java.io.EOFException > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitFds(DataXceiver.java:352) > at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitFds(Receiver.java:187) > at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:89) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) > at java.lang.Thread.run(Thread.java:745) > {code} > Investigating more, since the exact exception from the DataNode call is not logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)