hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Miner (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class
Date Mon, 21 Nov 2016 20:07:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684576#comment-15684576
] 

Luke Miner edited comment on HADOOP-13811 at 11/21/16 8:07 PM:
---------------------------------------------------------------

Okay built off the PR [https://github.com/apache/spark/pull/12004], with the following command

{code}
dev/make-distribution.sh -Pyarn,hadoop-2.7,hive,cloud -Dhadoop.version=2.9.0-SNAPSHOT -Pmesos
{code}

When I try to build my application against this build, I'm now missing a bunch of dependencies:

{code}
org.apache.hadoop
org.apache.spark.sql.types
org.json4s.jackson
com.fasterxml
org.apache.spark.sql.catalyst.analysis
{code}

I did have to add the newly built spark-core and spark-sql jars to my local maven repository
as follows:
{code}
mvn install:install-file -Dfile=./spark-core_2.11-2.1.0-SNAPSHOT.jar -DgroupId=org.apache.spark
-DartifactId=spark-core_2.11 -Dversion=2.1.0-SNAPSHOT -Dpackaging=jar
{code}

Here's my build.sbt. I'm using sbt-assembly

{code}
name := "json2pq"

version := "1.2.1"

scalaVersion := "2.11.8"

resolvers += Resolver.mavenLocal

// spark
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0-SNAPSHOT"  % "provided"
excludeAll (
  ExclusionRule("org.slf4j", "slf4j-api")
)
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0-SNAPSHOT" % "provided"

// other libraries
libraryDependencies += "org.json4s" %% "json4s-native" % "3.5.0"
libraryDependencies += "org.rogach" %% "scallop" % "2.0.5"
libraryDependencies += "com.github.nscala-time" %% "nscala-time" % "2.14.0"
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging" % "3.5.0"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.1.7"

// test
libraryDependencies += "org.scalatest" % "scalatest_2.11" % "2.2.6" % "test"
libraryDependencies += "junit" % "junit" % "4.11" % "test"
libraryDependencies += "com.novocode" % "junit-interface" % "0.11" % "test"

assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
net.virtualvoid.sbt.graph.Plugin.graphSettings
{code}


was (Author: lminer):
Okay built off the PR [https://github.com/apache/spark/pull/12004], with the following command

{code}
dev/make-distribution.sh -Pyarn,hadoop-2.7,hive,cloud -Dhadoop.version=2.9.0-SNAPSHOT -Pmesos
{code}

When I try to build my application against this build, I'm now missing a bunch of dependencies:

{code}
org.apache.hadoop
org.apache.spark.sql.types
org.json4s.jackson
com.fasterxml
org.apache.spark.sql.catalyst.analysis
{code}

Here's my build.sbt. I'm using sbt-assembly

{code}
name := "json2pq"

version := "1.2.1"

scalaVersion := "2.11.8"

resolvers += Resolver.mavenLocal

// spark
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0-SNAPSHOT"  % "provided"
excludeAll (
  ExclusionRule("org.slf4j", "slf4j-api")
)
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0-SNAPSHOT" % "provided"

// other libraries
libraryDependencies += "org.json4s" %% "json4s-native" % "3.5.0"
libraryDependencies += "org.rogach" %% "scallop" % "2.0.5"
libraryDependencies += "com.github.nscala-time" %% "nscala-time" % "2.14.0"
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging" % "3.5.0"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.1.7"

// test
libraryDependencies += "org.scalatest" % "scalatest_2.11" % "2.2.6" % "test"
libraryDependencies += "junit" % "junit" % "4.11" % "test"
libraryDependencies += "com.novocode" % "junit-interface" % "0.11" % "test"

assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
net.virtualvoid.sbt.graph.Plugin.graphSettings
{code}

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize
XML document destined for handler class
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13811
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13811
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting with {{com.amazonaws.AmazonClientException:
Failed to sanitize XML document destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message