beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Robertson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2457) Error: "Unable to find registrar for hdfs" - need to prevent/improve error message
Date Thu, 28 Sep 2017 20:03:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184756#comment-16184756
] 

Tim Robertson commented on BEAM-2457:
-------------------------------------

Using 2.1.0, I have a case of this where a custom input format can read fine using:
{code}
    Configuration hadoopConf = new Configuration();
    hadoopConf.setClass("mapreduce.job.inputformat.class", DwCAInputFormat.class, InputFormat.class);
    hadoopConf.setStrings("mapreduce.input.fileinputformat.inputdir", "hdfs://nameservice1/tmp/dwca.zip");
    hadoopConf.setClass("key.class", Text.class, Object.class);
    hadoopConf.setClass("value.class", ExtendedRecord.class, Object.class);

    PCollection<KV<Text,ExtendedRecord>> rawRecords =
      p.apply("read", HadoopInputFormatIO.<Text, ExtendedRecord>read().withConfiguration(hadoopConf));
    // etc (logs show it runs fine)
{code} 

But adding the following avro write:
{code}
  verbatimRecords.apply(AvroIO.write(UntypedOccurrence.class).to("hdfs://tmp/delme"));
{code}

fails with:
{code}
Exception in thread "main" java.lang.IllegalStateException: Unable to find registrar for hdfs
	at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447)
	at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:517)
	at org.apache.beam.sdk.io.FileBasedSink.convertToFileResourceIfPossible(FileBasedSink.java:204)
	at org.apache.beam.sdk.io.AvroIO$Write.to(AvroIO.java:304)
{code}

That the input works but the output doesn't help shed light on this confusion?

> Error: "Unable to find registrar for hdfs" - need to prevent/improve error message
> ----------------------------------------------------------------------------------
>
>                 Key: BEAM-2457
>                 URL: https://issues.apache.org/jira/browse/BEAM-2457
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>    Affects Versions: 2.0.0
>            Reporter: Stephen Sisk
>            Assignee: Flavio Fiszman
>
> I've noticed a number of user reports where jobs are failing with the error message "Unable
to find registrar for hdfs": 
> * https://stackoverflow.com/questions/44497662/apache-beamunable-to-find-registrar-for-hdfs/44508533?noredirect=1#comment76026835_44508533
> * https://lists.apache.org/thread.html/144c384e54a141646fcbe854226bb3668da091c5dc7fa2d471626e9b@%3Cuser.beam.apache.org%3E
> * https://lists.apache.org/thread.html/e4d5ac744367f9d036a1f776bba31b9c4fe377d8f11a4b530be9f829@%3Cuser.beam.apache.org%3E

> This isn't too many reports, but it is the only time I can recall so many users reporting
the same error message in a such a short amount of time. 
> We believe the problem is one of two things: 
> 1) bad uber jar creation
> 2) incorrect HDFS configuration
> However, it's highly possible this could have some other root cause. 
> It seems like it'd be useful to:
> 1) Follow up with the above reports to see if they've resolved the issue, and if so what
fixed it. There may be another root cause out there.
> 2) Improve the error message to include more information about how to resolve it
> 3) See if we can improve detection of the error cases to give more specific information
(specifically, if HDFS is miconfigured, can we detect that somehow and tell the user exactly
that?)
> 4) update documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message