hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14138) Remove S3A ref from META-INF service discovery, rely on existing core-default entry
Date Thu, 06 Apr 2017 11:01:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958744#comment-15958744

Steve Loughran commented on HADOOP-14138:

bq. why should s3a entries exist in core-default.xml?

Because that's where you set default values which are then overridden in core-site.xml. We
don't have any notion of per-FS resources other than {{hdfs-default.xml}} and {{hdfs-site.xml}}.
By putting defaults 

bq. core-default is supposed to contain defaults for most config values, and serves as documentation.

Exactly. And because it is loaded before core-site.xml, there is a straightforward, easy to
understand override mechanism. 

bq. If someone wants to use s3a, I'd expect them to explicitly set it up in their Configuration,

Well, no. Because that removes the ability for you set options in core-site or elsewhere,
including but not limited to {{fs.s3a.endpoint}}, all the [fs.s3a. security settings|http://hadoop.apache.org/docs/r2.8.0/hadoop-aws/tools/hadoop-aws/index.html#S3A_Authentication_methods],
along with many others. 

bq. or rely on the ServiceLoader approach - which this jira reverses.

The service loader mech for S3a was pulled because

# it was a performance hit, especially once we shifted to the fully shaded AWS JAR, that is:
the one which stops breaking downstream apps due to forced jackson upgrades, and 
# the service loader itself was a bit of trouble. HADOOP-12636 is the key one: In Hadoop 2.7.2,
if you had hadoop-aws.jar on the CP but not amazon-s3-sdk, clients would fail on startup with
a class not found exception during FS static init, *even if s3a wasn't used*; HADOOP-13323
removed the caught-but-logged entry from loggin gat warn to debug, because even that stack
was causing confusion
# Finally, as the service loader doesn't register {{FileContext}} bindings, so if you used
that API to talk to filesystems, those core-default entries were mandatory.

Because the fs.s3a.impl declaration was already in core-default, the consequence of this introspection
was at best, startup delays, at worst, [startup failures|http://stackoverflow.com/questions/30426245/apache-spark-classloader-cannot-find-classdef-in-the-jar].
So we pulled it. Now any classloader delays are postponed until the first s3a, wasb, adl,
swift FS instance is created, which happens if and only if the caller uses the class.

You have to consider the current service loader a first pass; HADOOP-14132 discusses how to
do it better: scan a zero-dependency class file which declares schemas. It could list a per-fs
XML resource, but the problem which arises there is the ordering of resources: the FS scan
always takes place after the core-default/core-site load, and as {{Configuration.addDefaultResource()}}
doesn't let you declare an ordering of defaults, any per-fs resource load would stamp over
core-default. We'd need to change allow {{addDefaultResource()}} to permit a list of before-resources
and after-resources to be defined.

Yes, the consequence of this change is that the {{fs.s3a.impl}} class isn't automatically,
but if core-default isn't loading, then your code is inevitably going to break in some other
way, I'd suspect security being a key point.

> Remove S3A ref from META-INF service discovery, rely on existing core-default entry
> -----------------------------------------------------------------------------------
>                 Key: HADOOP-14138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14138
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>             Fix For: 2.8.0, 2.7.4, 3.0.0-alpha3
>         Attachments: HADOOP-14138.001.patch, HADOOP-14138-branch-2-001.patch
> As discussed in HADOOP-14132, the shaded AWS library is killing performance starting
all hadoop operations, due to classloading on FS service discovery.
> This is despite the fact that there is an entry for fs.s3a.impl in core-default.xml,
*we don't need service discovery here*
> Proposed:
> # cut the entry from {{/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}}
> # when HADOOP-14132 is in, move to that, including declaring an XML file exclusively
for s3a entries
> I want this one in first as its a major performance regression, and one we coula actually
backport to 2.7.x, just to improve load time slightly there too

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message