hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
Date Fri, 24 Apr 2015 11:37:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510877#comment-14510877
] 

Hudson commented on HDFS-8070:
------------------------------

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #173 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/173/])
HDFS-8070. Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode (cmccabe)
(cmccabe: rev a8898445dc9b5cdb7230e2e23a57393c9f378ff0)
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java


> Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-8070
>                 URL: https://issues.apache.org/jira/browse/HDFS-8070
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: caching
>    Affects Versions: 2.7.0
>            Reporter: Gopal V
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>             Fix For: 2.7.1
>
>         Attachments: HDFS-8070.001.patch
>
>
> HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded split-generation.
> I hit this immediately after I upgraded the data, so I wonder if the ShortCircuitShim
wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 Client?
> {code}
> 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC pushdown predicate:
leaf-0 = (IS_NULL ss_sold_date_sk)
> expr = (not leaf-0)
> 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache:
ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=2,
shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto
to /grid/0/cluster/hdfs/dn_socket.  Closing shared memory segment.
> java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with
shmId a86ee34576d93c4964005d90b0d97c38
> 	at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC pushdown predicate:
leaf-0 = (IS_NULL ss_sold_date_sk)
> expr = (not leaf-0)
> 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] shortcircuit.DfsClientShmManager:
EndpointShmManager(172.19.128.60:50010, parent=ShortCircuitShmManager(5e763476)): error shutting
down shm: got IOException calling shutdown(SHUT_RDWR)
> java.nio.channels.ClosedChannelException
> 	at org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57)
> 	at org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387)
> 	at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378)
> 	at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC pushdown predicate:
leaf-0 = (IS_NULL cs_sold_date_sk)
> expr = (not leaf-0)
> 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache:
ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=4,
shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto
to /grid/0/cluster/hdfs/dn_socket.  Closing shared memory segment.
> java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with
shmId a86ee34576d93c4964005d90b0d97c38
> 	at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> Looks like a double free-fd condition?
> {code}
> 2015-04-02 18:58:47,653 [DataXceiver for client unix:/grid/0/cluster/hdfs/dn_socket [Passing
file descriptors for block BP-942051088-172.18.0.41-1370508013893:blk_1076973408_1099515627985]]
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unregistering SlotId(3bd7fd9aed791e95acfb5034e6617d83:0)
because the requestShortCircuitFdsForRead operation failed.
> 2015-04-02 18:58:47,653 [DataXceiver for client unix:/grid/0/cluster/hdfs/dn_socket [Passing
file descriptors for block BP-942051088-<ip>-1370508013893:blk_1076973408_1099515627985]]
INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 127.0.0.1, dest: 127.0.0.1,
op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1076973408, srvID: ba7b6f19-47e0-4b86-af50-23981649318c,
success: false
> 2015-04-02 18:58:47,654 [DataXceiver for client unix:/grid/0/cluster/hdfs/dn_socket [Passing
file descriptors for block BP-942051088-172.18.0.41-1370508013893:blk_1076973408_1099515627985]]
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: cn060-10.l42scl.hortonworks.com:50010:DataXceiver
error processing REQUEST_SHORT_CIRCUIT_FDS operation  src: unix:/grid/0/cluster/hdfs/dn_socket
dst: <local>
> java.io.EOFException
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitFds(DataXceiver.java:352)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitFds(Receiver.java:187)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:89)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> Investigating more, since the exact exception from the DataNode call is not logged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message