hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12739) Deadlock with OrcInputFormat split threads and Jets3t connections, since, NativeS3FileSystem does not release connections with seek()
Date Mon, 25 Jan 2016 10:24:39 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115004#comment-15115004
] 

Steve Loughran commented on HADOOP-12739:
-----------------------------------------

If there's something which scares us, it's patches for s3n. Something always breaks somewhere
else. So while I don't doubt your discovery of the bug, i worry about the implications for
fixing it. In particular, we know that the latest jets3t uses an http client lib which close()s
connections by reading in the rest of the stream ... not what we want to do when seeking a
few bytes in a many GB file. I don't know if the patch here makes that any worse, or just
hurts seek more.

# the patch submission process for objects stores is listed, as noted, please reassure us
that you ran all the aws test suite and they worked.: https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure

# What happens on S3a? It's the better performing FS, and there's actually a pending patch
there for lazy-seeks: the input stream isn't even opened until the read


> Deadlock with OrcInputFormat split threads and Jets3t connections, since, NativeS3FileSystem
does not release connections with seek()
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12739
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12739
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.6.0, 2.7.0
>            Reporter: Pavan Srinivas
>            Assignee: Pavan Srinivas
>         Attachments: 11600.txt, HADOOP-12739.patch
>
>
> Recently, we came across a deadlock situation with OrcInputFormat while computing splits.

> - In Orc, for split computation, it needs file listing and file sizes. 
> - Multiple threads are invoked for listing the files and if the data is located in S3,
NativeS3FileSystem is used. 
> - NativeS3FileSystem in turn uses JetS3t Lib to talk to AWS and maintain connection pool.

> - When # of threads from OrcInputFormat exceeds JetS3t's max # of connections, a deadlock
occurs. stack trace: 
> {code}
> "ORC_GET_SPLITS #5" daemon prio=10 tid=0x00007f8568108800 nid=0x1e29 in Object.wait()
[0x00007f8565696000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000000df9ed450> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
> 	at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
> 	- locked <0x00000000df9ed450> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
> 	at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
> 	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
> 	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
> 	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
> 	at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:370)
> 	at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestGet(RestStorageService.java:929)
> 	at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2007)
> 	at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:1944)
> 	at org.jets3t.service.S3Service.getObject(S3Service.java:2625)
> 	at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:254)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at org.apache.hadoop.fs.s3native.$Proxy12.retrieve(Unknown Source)
> 	at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.reopen(NativeS3FileSystem.java:269)
> 	- locked <0x00000000db01eec0> (a org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream)
> 	at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:258)
> 	- locked <0x00000000db01eec0> (a org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream)
> 	at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:98)
> 	at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63)
> 	- locked <0x00000000db01ee70> (a org.apache.hadoop.fs.FSDataInputStream)
> 	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:329)
> 	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:292)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:197)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:857)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:747)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
>    Locked ownable synchronizers:
> 	- <0x00000000dae7bcb8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> {code}
> A complete *jstack* dump of the process is attached with. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message