beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Robertson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (BEAM-2457) Error: "Unable to find registrar for hdfs" - need to prevent/improve error message
Date Fri, 29 Sep 2017 13:44:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185845#comment-16185845
] 

Tim Robertson edited comment on BEAM-2457 at 9/29/17 1:43 PM:
--------------------------------------------------------------

I got to the bottom of this for my case.  The TL;DR to make sure you have this when shading
up the über jar:
{code}
<transformers>
  <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
{code}

A service loader is used to register the {{FileSystemRegistrar}}.  You can see which are registered
using:
{code}
    Set<FileSystemRegistrar> registrars =
      Sets.newTreeSet(ReflectHelpers.ObjectsClassComparator.INSTANCE);
    registrars.addAll(Lists.newArrayList(
      ServiceLoader.load(FileSystemRegistrar.class, ReflectHelpers.findClassLoader())));

    for (FileSystemRegistrar reg : registrars) {
      System.out.println(reg.getClass());
    }
{code}

Assuming you have built an über jar to submit, what is loaded is defined by the classes listed
in the {{/META-INF/service/org.apache.beam.sdk.io.FileSystemRegistrar}} file (you can expand
your jar and take a look).  When there are several on the build path the first will win, and
the HDFS one may not be used.  Merging at build time using Maven shading can be like so:
{code}
      <!-- Shade the project into an über jar to send to Spark -->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <configuration>
          <createDependencyReducedPom>false</createDependencyReducedPom>
          <filters>
            <filter>
              <artifact>*:*</artifact>
              <excludes>
                <exclude>META-INF/*.SF</exclude>
                <exclude>META-INF/*.DSA</exclude>
                <exclude>META-INF/*.RSA</exclude>
              </excludes>
            </filter>
          </filters>
        </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <shadedArtifactAttached>true</shadedArtifactAttached>
              <shadedClassifierName>shaded</shadedClassifierName>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
              </transformers>
            </configuration>
          </execution>
        </executions>
      </plugin>
{code} 

With this done, the resulting file will read as:
{code}
org.apache.beam.sdk.io.LocalFileSystemRegistrar
org.apache.beam.sdk.io.hdfs.HadoopFileSystemRegistrar
{code}

I hope this helps someone else.  For a Cloudera CDH (presumable HW too) environment, when
running on a gateway machine all the rest of the Hadoop config should just get picked up automatically
and nothing else should be needed.




was (Author: timrobertson100):
I got to the bottom of this for my case.  The TL;DR to make sure you have this when sharing
up the über jar:
{code}
<transformers>
  <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
{code}

A service loader is used to register the {{FileSystemRegistrar}}.  You can see which are registered
using:
{code}
    Set<FileSystemRegistrar> registrars =
      Sets.newTreeSet(ReflectHelpers.ObjectsClassComparator.INSTANCE);
    registrars.addAll(Lists.newArrayList(
      ServiceLoader.load(FileSystemRegistrar.class, ReflectHelpers.findClassLoader())));

    for (FileSystemRegistrar reg : registrars) {
      System.out.println(reg.getClass());
    }
{code}

Assuming you have built an über jar to submit, what is loaded is defined by the classes listed
in the {{/META-INF/service/org.apache.beam.sdk.io.FileSystemRegistrar}} file (you can expand
your jar and take a look).  When there are several on the build path the first will win, and
the HDFS one may not be used.  Merging at build time using Maven shading can be like so:
{code}
      <!-- Shade the project into an über jar to send to Spark -->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <configuration>
          <createDependencyReducedPom>false</createDependencyReducedPom>
          <filters>
            <filter>
              <artifact>*:*</artifact>
              <excludes>
                <exclude>META-INF/*.SF</exclude>
                <exclude>META-INF/*.DSA</exclude>
                <exclude>META-INF/*.RSA</exclude>
              </excludes>
            </filter>
          </filters>
        </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <shadedArtifactAttached>true</shadedArtifactAttached>
              <shadedClassifierName>shaded</shadedClassifierName>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
              </transformers>
            </configuration>
          </execution>
        </executions>
      </plugin>
{code} 

With this done, the resulting file will read as:
{code}
org.apache.beam.sdk.io.LocalFileSystemRegistrar
org.apache.beam.sdk.io.hdfs.HadoopFileSystemRegistrar
{code}

I hope this helps someone else.  For a Cloudera CDH (presumable HW too) environment, when
running on a gateway machine all the rest of the Hadoop config should just get picked up automatically
and nothing else should be needed.



> Error: "Unable to find registrar for hdfs" - need to prevent/improve error message
> ----------------------------------------------------------------------------------
>
>                 Key: BEAM-2457
>                 URL: https://issues.apache.org/jira/browse/BEAM-2457
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>    Affects Versions: 2.0.0
>            Reporter: Stephen Sisk
>            Assignee: Flavio Fiszman
>
> I've noticed a number of user reports where jobs are failing with the error message "Unable
to find registrar for hdfs": 
> * https://stackoverflow.com/questions/44497662/apache-beamunable-to-find-registrar-for-hdfs/44508533?noredirect=1#comment76026835_44508533
> * https://lists.apache.org/thread.html/144c384e54a141646fcbe854226bb3668da091c5dc7fa2d471626e9b@%3Cuser.beam.apache.org%3E
> * https://lists.apache.org/thread.html/e4d5ac744367f9d036a1f776bba31b9c4fe377d8f11a4b530be9f829@%3Cuser.beam.apache.org%3E

> This isn't too many reports, but it is the only time I can recall so many users reporting
the same error message in a such a short amount of time. 
> We believe the problem is one of two things: 
> 1) bad uber jar creation
> 2) incorrect HDFS configuration
> However, it's highly possible this could have some other root cause. 
> It seems like it'd be useful to:
> 1) Follow up with the above reports to see if they've resolved the issue, and if so what
fixed it. There may be another root cause out there.
> 2) Improve the error message to include more information about how to resolve it
> 3) See if we can improve detection of the error cases to give more specific information
(specifically, if HDFS is miconfigured, can we detect that somehow and tell the user exactly
that?)
> 4) update documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message