Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7935C200C2C for ; Fri, 3 Mar 2017 12:54:52 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 77C51160B80; Fri, 3 Mar 2017 11:54:52 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7611B160B6D for ; Fri, 3 Mar 2017 12:54:51 +0100 (CET) Received: (qmail 85583 invoked by uid 500); 3 Mar 2017 11:54:50 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 85559 invoked by uid 99); 3 Mar 2017 11:54:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Mar 2017 11:54:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A283118E86E for ; Fri, 3 Mar 2017 11:54:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.346 X-Spam-Level: X-Spam-Status: No, score=-2.346 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652, WEIRD_PORT=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id rvbkRi87S-ee for ; Fri, 3 Mar 2017 11:54:46 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 48C505FC00 for ; Fri, 3 Mar 2017 11:54:46 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A4EB7E00B4 for ; Fri, 3 Mar 2017 11:54:45 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 627E62415A for ; Fri, 3 Mar 2017 11:54:45 +0000 (UTC) Date: Fri, 3 Mar 2017 11:54:45 +0000 (UTC) From: "Steve Loughran (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-14142) S3A - Adding unexpected prefix MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 03 Mar 2017 11:54:52 -0000 [ https://issues.apache.org/jira/browse/HADOOP-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894203#comment-15894203 ] Steve Loughran commented on HADOOP-14142: ----------------------------------------- Log. Vishnu: we like to have stacks and logs in a comment so it doesn't get included with every email; use the \{noformat } or \{code} header and footer to keep the text unformatted thanks {code} application/x-www-form-urlencoded; charset=utf-8 Thu, 02 Mar 2017 22:40:25 GMT /myBkt8/" 17/03/02 14:40:25 DEBUG request: Sending Request: GET https://webscaledemo.netapp.com:8082 /myBkt8/ Parameters: (max-keys: 1, prefix: user/vardhan/, delimiter: /, ) Headers: (Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324=, User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60, Date: Thu, 02 Mar 2017 22:40:25 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, ) 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Connection request: [route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 0; route allocated: 0 of 15; total allocated: 0 of 15] 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Connection leased: [id: 10][route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 0; route allocated: 1 of 15; total allocated: 1 of 15] 17/03/02 14:40:25 DEBUG DefaultClientConnectionOperator: Connecting to webscaledemo.netapp.com:8082 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Closing connections idle longer than 60 SECONDS 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Closing connections idle longer than 60 SECONDS 17/03/02 14:40:26 DEBUG RequestAddCookies: CookieSpec selected: default 17/03/02 14:40:26 DEBUG RequestAuthCache: Auth cache not set in the context 17/03/02 14:40:26 DEBUG RequestProxyAuthentication: Proxy auth state: UNCHALLENGED 17/03/02 14:40:26 DEBUG SdkHttpClient: Attempt 1 to execute request 17/03/02 14:40:26 DEBUG DefaultClientConnection: Sending request: GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1 17/03/02 14:40:26 DEBUG wire: >> "GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "Host: webscaledemo.netapp.com:8082[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324=[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "Date: Thu, 02 Mar 2017 22:40:25 GMT[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "Content-Type: application/x-www-form-urlencoded; charset=utf-8[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "Connection: Keep-Alive[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "[\r][\n]" 17/03/02 14:40:26 DEBUG headers: >> GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1 17/03/02 14:40:26 DEBUG headers: >> Host: webscaledemo.netapp.com:8082 17/03/02 14:40:26 DEBUG headers: >> Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324= 17/03/02 14:40:26 DEBUG headers: >> User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60 17/03/02 14:40:26 DEBUG headers: >> Date: Thu, 02 Mar 2017 22:40:25 GMT 17/03/02 14:40:26 DEBUG headers: >> Content-Type: application/x-www-form-urlencoded; charset=utf-8 17/03/02 14:40:26 DEBUG headers: >> Connection: Keep-Alive 17/03/02 14:40:26 DEBUG wire: << "HTTP/1.1 200 OK[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "Date: Thu, 02 Mar 2017 22:40:26 GMT[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "Connection: KEEP-ALIVE[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "Server: StorageGRID/10.3.0.1[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "x-amz-request-id: 563477649[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "Content-Length: 266[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "Content-Type: application/xml[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "[\r][\n]" 17/03/02 14:40:26 DEBUG DefaultClientConnection: Receiving response: HTTP/1.1 200 OK 17/03/02 14:40:26 DEBUG headers: << HTTP/1.1 200 OK 17/03/02 14:40:26 DEBUG headers: << Date: Thu, 02 Mar 2017 22:40:26 GMT 17/03/02 14:40:26 DEBUG headers: << Connection: KEEP-ALIVE 17/03/02 14:40:26 DEBUG headers: << Server: StorageGRID/10.3.0.1 17/03/02 14:40:26 DEBUG headers: << x-amz-request-id: 563477649 17/03/02 14:40:26 DEBUG headers: << Content-Length: 266 17/03/02 14:40:26 DEBUG headers: << Content-Type: application/xml 17/03/02 14:40:26 DEBUG SdkHttpClient: Connection can be kept alive indefinitely 17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Sanitizing XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler 17/03/02 14:40:26 DEBUG wire: << "[\n]" 17/03/02 14:40:26 DEBUG wire: << "myBkt8user/vardhan/1/false" 17/03/02 14:40:26 DEBUG PoolingClientConnectionManager: Connection [id: 10][route: {s}->https://webscaledemo.netapp.com:8082] can be kept alive indefinitely 17/03/02 14:40:26 DEBUG PoolingClientConnectionManager: Connection released: [id: 10][route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 1; route allocated: 1 of 15; total allocated: 1 of 15] 17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Parsing XML response document with handler: class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler 17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Examining listing for bucket: myBkt8 17/03/02 14:40:26 DEBUG request: Received successful response: 200, AWS Request ID: 563477649 17/03/02 14:40:26 DEBUG S3AFileSystem: Not Found: s3a://myBkt8/user/vardhan org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3a://myBkt8 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) at org.apache.spark.rdd.RDD.count(RDD.scala:1157) ... 53 elided {code} > S3A - Adding unexpected prefix > ------------------------------ > > Key: HADOOP-14142 > URL: https://issues.apache.org/jira/browse/HADOOP-14142 > Project: Hadoop Common > Issue Type: Bug > Reporter: Vishnu Vardhan > Priority: Critical > > Hi: > S3A seems to prefix unexpected prefix to my s3 path > Specifically, in the debug log below the following line is unexpected > > GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1 > It is not clear where the "prefix" is coming from and why. > I executed the following commands > sc.setLogLevel("DEBUG") > sc.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem") > sc.hadoopConfiguration.set("fs.s3a.endpoint","webscaledemo.netapp.com:8082") > sc.hadoopConfiguration.set("fs.s3a.access.key","") > sc.hadoopConfiguration.set("fs.s3a.secret.key","") > sc.hadoopConfiguration.set("fs.s3a.path.style.access","false") > val s3Rdd = sc.textFile("s3a://myBkt98") > s3Rdd.count() > ---- > debug log is below > application/x-www-form-urlencoded; charset=utf-8 > Thu, 02 Mar 2017 22:40:25 GMT > /myBkt8/" > 17/03/02 14:40:25 DEBUG request: Sending Request: GET https://webscaledemo.netapp.com:8082 /myBkt8/ Parameters: (max-keys: 1, prefix: user/vardhan/, delimiter: /, ) Headers: (Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324=, User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60, Date: Thu, 02 Mar 2017 22:40:25 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, ) > 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Connection request: [route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 0; route allocated: 0 of 15; total allocated: 0 of 15] > 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Connection leased: [id: 10][route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 0; route allocated: 1 of 15; total allocated: 1 of 15] > 17/03/02 14:40:25 DEBUG DefaultClientConnectionOperator: Connecting to webscaledemo.netapp.com:8082 > 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Closing connections idle longer than 60 SECONDS > 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Closing connections idle longer than 60 SECONDS > 17/03/02 14:40:26 DEBUG RequestAddCookies: CookieSpec selected: default > 17/03/02 14:40:26 DEBUG RequestAuthCache: Auth cache not set in the context > 17/03/02 14:40:26 DEBUG RequestProxyAuthentication: Proxy auth state: UNCHALLENGED > 17/03/02 14:40:26 DEBUG SdkHttpClient: Attempt 1 to execute request > 17/03/02 14:40:26 DEBUG DefaultClientConnection: Sending request: GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1 > 17/03/02 14:40:26 DEBUG wire: >> "GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: >> "Host: webscaledemo.netapp.com:8082[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: >> "Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324=[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: >> "User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: >> "Date: Thu, 02 Mar 2017 22:40:25 GMT[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: >> "Content-Type: application/x-www-form-urlencoded; charset=utf-8[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: >> "Connection: Keep-Alive[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: >> "[\r][\n]" > 17/03/02 14:40:26 DEBUG headers: >> GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1 > 17/03/02 14:40:26 DEBUG headers: >> Host: webscaledemo.netapp.com:8082 > 17/03/02 14:40:26 DEBUG headers: >> Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324= > 17/03/02 14:40:26 DEBUG headers: >> User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60 > 17/03/02 14:40:26 DEBUG headers: >> Date: Thu, 02 Mar 2017 22:40:25 GMT > 17/03/02 14:40:26 DEBUG headers: >> Content-Type: application/x-www-form-urlencoded; charset=utf-8 > 17/03/02 14:40:26 DEBUG headers: >> Connection: Keep-Alive > 17/03/02 14:40:26 DEBUG wire: << "HTTP/1.1 200 OK[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: << "Date: Thu, 02 Mar 2017 22:40:26 GMT[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: << "Connection: KEEP-ALIVE[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: << "Server: StorageGRID/10.3.0.1[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: << "x-amz-request-id: 563477649[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: << "Content-Length: 266[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: << "Content-Type: application/xml[\r][\n]" > 17/03/02 14:40:26 DEBUG wire: << "[\r][\n]" > 17/03/02 14:40:26 DEBUG DefaultClientConnection: Receiving response: HTTP/1.1 200 OK > 17/03/02 14:40:26 DEBUG headers: << HTTP/1.1 200 OK > 17/03/02 14:40:26 DEBUG headers: << Date: Thu, 02 Mar 2017 22:40:26 GMT > 17/03/02 14:40:26 DEBUG headers: << Connection: KEEP-ALIVE > 17/03/02 14:40:26 DEBUG headers: << Server: StorageGRID/10.3.0.1 > 17/03/02 14:40:26 DEBUG headers: << x-amz-request-id: 563477649 > 17/03/02 14:40:26 DEBUG headers: << Content-Length: 266 > 17/03/02 14:40:26 DEBUG headers: << Content-Type: application/xml > 17/03/02 14:40:26 DEBUG SdkHttpClient: Connection can be kept alive indefinitely > 17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Sanitizing XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler > 17/03/02 14:40:26 DEBUG wire: << "[\n]" > 17/03/02 14:40:26 DEBUG wire: << "myBkt8user/vardhan/1/false" > 17/03/02 14:40:26 DEBUG PoolingClientConnectionManager: Connection [id: 10][route: {s}->https://webscaledemo.netapp.com:8082] can be kept alive indefinitely > 17/03/02 14:40:26 DEBUG PoolingClientConnectionManager: Connection released: [id: 10][route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 1; route allocated: 1 of 15; total allocated: 1 of 15] > 17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Parsing XML response document with handler: class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler > 17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Examining listing for bucket: myBkt8 > 17/03/02 14:40:26 DEBUG request: Received successful response: 200, AWS Request ID: 563477649 > 17/03/02 14:40:26 DEBUG S3AFileSystem: Not Found: s3a://myBkt8/user/vardhan > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3a://myBkt8 > at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) > at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) > at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) > at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) > at org.apache.spark.rdd.RDD.count(RDD.scala:1157) > ... 53 elided -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org