spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tayyebi, Ameen" <tayye...@amazon.com>
Subject Re: A new external catalog
Date Wed, 14 Feb 2018 19:56:45 GMT
Newbie question:

I want to add system/integration tests for the new functionality. There are a set of existing
tests around Spark Catalog that I can leverage. Great. The provider I’m writing is backed
by a web service though which is part of an AWS account. I can write the tests using a mocked
client that somehow clones the behavior of the webservice, but I’ll get the most value if
I actually run the tests against a real AWS Glue account.

How do you guys deal with external dependencies for system tests? Is there an AWS account
that is used for this purpose by any chance?

Thanks,
-Ameen

From: Steve Loughran <stevel@hortonworks.com>
Date: Tuesday, February 13, 2018 at 5:01 PM
To: "Tayyebi, Ameen" <tayyebia@amazon.com>
Cc: Apache Spark Dev <dev@spark.apache.org>
Subject: Re: A new external catalog




On 13 Feb 2018, at 21:20, Tayyebi, Ameen <tayyebia@amazon.com<mailto:tayyebia@amazon.com>>
wrote:

Yes, I’m thinking about upgrading to these:
<aws.kinesis.client.version>1.9.0</aws.kinesis.client.version>
<!-- Should be consistent with Kinesis client dependency -->
<aws.java.sdk.version>1.11.272</aws.java.sdk.version>

From:

<aws.kinesis.client.version>1.7.3</aws.kinesis.client.version>
<!-- Should be consistent with Kinesis client dependency -->
<aws.java.sdk.version>1.11.76</aws.java.sdk.version>

272 is the earliest that has Glue.

How about I let the build system run the tests and if things start breaking I fall back to
shading Glue’s specific SDK?


FWIW, some of the other troublespots are not functional, they're log overflow

https://issues.apache.org/jira/browse/HADOOP-15040
https://issues.apache.org/jira/browse/HADOOP-14596

Myself and Cloudera collaborators are testing the shaded 1.11.271 JAR & will go with that
into Hadoop 3.1 if we're happy, but that's not so much for new features but "stack traces
throughout the log", which seems to be a recurrent issue with the JARs, and one which often
slips by CI build runs. If it wasn't for that, we'd have stuck with 1.11.199 because it didn't
have any issues that we hadn't already got under control (https://github.com/aws/aws-sdk-java/issues/1211)

Like I said: upgrades bring fear


From: Steve Loughran <stevel@hortonworks.com<mailto:stevel@hortonworks.com>>
Date: Tuesday, February 13, 2018 at 3:34 PM
To: "Tayyebi, Ameen" <tayyebia@amazon.com<mailto:tayyebia@amazon.com>>
Cc: Apache Spark Dev <dev@spark.apache.org<mailto:dev@spark.apache.org>>
Subject: Re: A new external catalog





On 13 Feb 2018, at 19:50, Tayyebi, Ameen <tayyebia@amazon.com<mailto:tayyebia@amazon.com>>
wrote:


The biggest challenge is that I had to upgrade the AWS SDK to a newer version so that it includes
the Glue client since Glue is a new service. So far, I haven’t see any jar hell issues,
but that’s the main drawback I can see. I’ve made sure the version is in sync with the
Kinesis client used by spark-streaming module.

Funnily enough, I'm currently updating the s3a troubleshooting doc, the latest version up
front saying

"Whatever problem you have, changing the AWS SDK version will not fix things, only change
the stack traces you see."

https://github.com/steveloughran/hadoop/blob/s3/HADOOP-15076-trouble-and-perf/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md

Upgrading AWS SDKs is, sadly, often viewed with almost the same fear as guava, especially
if it's the unshaded version which forces in a version of jackson.

Which SDK version are you proposing? 1.11.x ?


Mime
View raw message