spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tayyebi, Ameen" <tayye...@amazon.com>
Subject Re: A new external catalog
Date Wed, 14 Feb 2018 13:51:33 GMT
Thanks a lot Steve. I’ll go through the Jira’s you linked in detail. I took a quick look
and am sufficiently scared for now. I had run into that warning from the S3 stream before.
Sigh.

From: Steve Loughran <stevel@hortonworks.com>
Date: Tuesday, February 13, 2018 at 5:01 PM
To: "Tayyebi, Ameen" <tayyebia@amazon.com>
Cc: Apache Spark Dev <dev@spark.apache.org>
Subject: Re: A new external catalog




On 13 Feb 2018, at 21:20, Tayyebi, Ameen <tayyebia@amazon.com<mailto:tayyebia@amazon.com>>
wrote:

Yes, I’m thinking about upgrading to these:
<aws.kinesis.client.version>1.9.0</aws.kinesis.client.version>
<!-- Should be consistent with Kinesis client dependency -->
<aws.java.sdk.version>1.11.272</aws.java.sdk.version>

From:

<aws.kinesis.client.version>1.7.3</aws.kinesis.client.version>
<!-- Should be consistent with Kinesis client dependency -->
<aws.java.sdk.version>1.11.76</aws.java.sdk.version>

272 is the earliest that has Glue.

How about I let the build system run the tests and if things start breaking I fall back to
shading Glue’s specific SDK?


FWIW, some of the other troublespots are not functional, they're log overflow

https://issues.apache.org/jira/browse/HADOOP-15040
https://issues.apache.org/jira/browse/HADOOP-14596

Myself and Cloudera collaborators are testing the shaded 1.11.271 JAR & will go with that
into Hadoop 3.1 if we're happy, but that's not so much for new features but "stack traces
throughout the log", which seems to be a recurrent issue with the JARs, and one which often
slips by CI build runs. If it wasn't for that, we'd have stuck with 1.11.199 because it didn't
have any issues that we hadn't already got under control (https://github.com/aws/aws-sdk-java/issues/1211)

Like I said: upgrades bring fear


From: Steve Loughran <stevel@hortonworks.com<mailto:stevel@hortonworks.com>>
Date: Tuesday, February 13, 2018 at 3:34 PM
To: "Tayyebi, Ameen" <tayyebia@amazon.com<mailto:tayyebia@amazon.com>>
Cc: Apache Spark Dev <dev@spark.apache.org<mailto:dev@spark.apache.org>>
Subject: Re: A new external catalog





On 13 Feb 2018, at 19:50, Tayyebi, Ameen <tayyebia@amazon.com<mailto:tayyebia@amazon.com>>
wrote:


The biggest challenge is that I had to upgrade the AWS SDK to a newer version so that it includes
the Glue client since Glue is a new service. So far, I haven’t see any jar hell issues,
but that’s the main drawback I can see. I’ve made sure the version is in sync with the
Kinesis client used by spark-streaming module.

Funnily enough, I'm currently updating the s3a troubleshooting doc, the latest version up
front saying

"Whatever problem you have, changing the AWS SDK version will not fix things, only change
the stack traces you see."

https://github.com/steveloughran/hadoop/blob/s3/HADOOP-15076-trouble-and-perf/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md

Upgrading AWS SDKs is, sadly, often viewed with almost the same fear as guava, especially
if it's the unshaded version which forces in a version of jackson.

Which SDK version are you proposing? 1.11.x ?


Mime
View raw message