Return-Path: X-Original-To: apmail-avro-dev-archive@www.apache.org Delivered-To: apmail-avro-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CAD20109F7 for ; Mon, 29 Jul 2013 17:27:53 +0000 (UTC) Received: (qmail 7421 invoked by uid 500); 29 Jul 2013 17:27:52 -0000 Delivered-To: apmail-avro-dev-archive@avro.apache.org Received: (qmail 7343 invoked by uid 500); 29 Jul 2013 17:27:51 -0000 Mailing-List: contact dev-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list dev@avro.apache.org Received: (qmail 7008 invoked by uid 99); 29 Jul 2013 17:27:50 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jul 2013 17:27:50 +0000 Date: Mon, 29 Jul 2013 17:27:50 +0000 (UTC) From: "Scott Carey (JIRA)" To: dev@avro.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (AVRO-1144) Deadlock with FSInput and Hadoop NativeS3FileSystem. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AVRO-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1144: ------------------------------ Attachment: AVRO-1144.patch Trivial patch to address this issue. > Deadlock with FSInput and Hadoop NativeS3FileSystem. > ---------------------------------------------------- > > Key: AVRO-1144 > URL: https://issues.apache.org/jira/browse/AVRO-1144 > Project: Avro > Issue Type: Bug > Components: java > Affects Versions: 1.7.0 > Environment: Hadoop 1.0.3 > Reporter: Shawn Smith > Attachments: AVRO-1144.patch > > > Deadlock can occur when using org.apache.avro.mapred.FsInput to read files from S3 using the Hadoop NativeS3FileSystem and multiple threads. > There are a lot of components involved, but the basic cause is pretty simple: Apache Commons HttpClient can deadlock waiting for a free HTTP connection when the number of threads downloading from S3 is greater than or equal to the maximum allowed HTTP connections per host. > I've filed this bug against Avro because the bug is easiest to fix in Avro. Swap the order of the FileSystem.open() and FileSystem.getFileStatus() calls in the FSInput constructor: > {noformat} > /** Construct given a path and a configuration. */ > public FsInput(Path path, Configuration conf) throws IOException { > this.stream = path.getFileSystem(conf).open(path); > this.len = path.getFileSystem(conf).getFileStatus(path).getLen(); > } > {noformat} > to > {noformat} > /** Construct given a path and a configuration. */ > public FsInput(Path path, Configuration conf) throws IOException { > this.len = path.getFileSystem(conf).getFileStatus(path).getLen(); > this.stream = path.getFileSystem(conf).open(path); > } > {noformat} > Here's what triggers the deadlock: > * FSInput calls FileSystem.open() which calls Jets3t to connect to S3 and open an HTTP connection for downloading content. This acquires an HTTP connection but does not release it. > * FSInput calls FileSystem.getFileStatus() which calls Jets3t to connect to S3 and perform a HEAD request to get object metadata. This attempts to acquire a second HTTP connection. > * Jets3t uses Apache Commons HTTP Client which limits the number of simultaneous HTTP connections to a given host. Lets say this maximum is 4 (the default)... If 4 threads all call the FSInput constructor concurrently, the 4 FileSystem.open() calls can acquire all 4 available connections and the FileSystem.getFileStatus() calls block forever waiting for a thread to release an HTTP connection back to the connection pool. > A simple way to reproduce the problem this problem is to create "jets3t.properties" in your classpath with "httpclient.max-connections=1". Then try to open a file using FSInput and the Native S3 file system (new Path("s3n:///")). It will hang indefinitely inside the FSInput constructor. > Swapping the order of the open() and getFileStatus() calls ensures that a given thread using FSInput has at most one outstanding connection S3 at a time. As a result, one thread should always be able to make progress, avoiding deadlock. > Here's a sample stack trace of a deadlocked thread: > {noformat} > "pool-10-thread-3" prio=5 tid=11026f800 nid=0x116a04000 in Object.wait() [116a02000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <785892cc0> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518) > - locked <785892cc0> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) > at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) > at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) > at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) > at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:357) > at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:652) > at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1556) > at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1492) > at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1793) > at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1225) > at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111) > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at org.apache.hadoop.fs.s3native.$Proxy25.retrieveMetadata(Unknown Source) > at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:326) > at org.apache.avro.mapred.FsInput.(FsInput.java:38) > at org.apache.crunch.io.avro.AvroFileReaderFactory.read(AvroFileReaderFactory.java:70) > at org.apache.crunch.io.CompositePathIterable$2.(CompositePathIterable.java:80) > at org.apache.crunch.io.CompositePathIterable.iterator(CompositePathIterable.java:78) > at com.example.load.BulkLoader$1.run(BulkLoadCommand.java:109) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira