hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class
Date Fri, 11 Nov 2016 18:30:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15657741#comment-15657741
] 

Steve Loughran commented on HADOOP-13811:
-----------------------------------------

This showed up during a test run against s3 ireland in the SPARK-7481 s3a integration tests.
{code}
2016-08-26 21:27:11,382 INFO  scheduler.JobScheduler (Logging.scala:logInfo(54)) - Finished
job streaming job 1472243229000 ms.0 from job set of time 1472243229000 ms
2016-08-26 21:27:11,382 INFO  scheduler.JobScheduler (Logging.scala:logInfo(54)) - Total delay:
2.382 s for time 1472243229000 ms (execution: 0.000 s)
2016-08-26 21:27:11,923 WARN  dstream.FileInputDStream (Logging.scala:logWarning(87)) - Error
finding new files under s3a://hwdev-steve-ireland-new/test/testname/streaming/sub*
org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on test/testname/streaming/:
com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler
class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: Failed
to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:105)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1462)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1227)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1203)
	at org.apache.hadoop.fs.s3a.S3AGlobber.listStatus(S3AGlobber.java:69)
	at org.apache.hadoop.fs.s3a.S3AGlobber.doGlob(S3AGlobber.java:210)
	at org.apache.hadoop.fs.s3a.S3AGlobber.glob(S3AGlobber.java:125)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.globStatus(S3AFileSystem.java:1853)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.globStatus(S3AFileSystem.java:1841)
...
	at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
	at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)
	at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246)
	at scala.util.Try$.apply(Try.scala:192)
	at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246)
	at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:182)
	at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
	at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Caused by: com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for
handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
	at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:222)
	at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:299)
	at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:77)
	at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:74)
	at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
	at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
	at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:1072)
	at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:746)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3738)
	at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:653)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:881)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1435)
	... 71 more
Caused by: com.amazonaws.AbortedException: 
	at com.amazonaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:51)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
	at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.BufferedReader.read1(BufferedReader.java:210)
	at java.io.BufferedReader.read(BufferedReader.java:286)
	at java.io.Reader.read(Reader.java:140)
	at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:195)
	... 85 more
2016-08-26 21:27:11,928 INFO  dstream.FileInputDStream (Logging.scala:logInfo(54)) - New files
at time 1472243230000 ms:

2016-08-26 21:27:11,930 INFO  scheduler.JobScheduler (Logging.scala:logInfo(54)) - Added jobs
for time 1472243230000 ms
-------------------------------------------
2016-08-26 21:27:11,930 INFO  scheduler.JobGenerator (Logging.scala:logInfo(54)) - Stopped
JobGenerator
Time: 1472243230000 ms
-------------------------------------------
{code}

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize
XML document destined for handler class
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13811
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13811
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting with {{com.amazonaws.AmazonClientException:
Failed to sanitize XML document destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message