hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinoth Chandar (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HUDI-539) RO Path filter does not pick up hadoop configs from the spark context
Date Sun, 22 Mar 2020 17:16:00 GMT

     [ https://issues.apache.org/jira/browse/HUDI-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinoth Chandar updated HUDI-539:
--------------------------------
    Description: 
Hi,
 I'm trying to use hudi to write to one of the Azure storage container file systems, ADLS
Gen 2 (abfs://). ABFS:// is one of the whitelisted file schemes. The issue I'm facing is that
in {{HoodieROTablePathFilter}} it tries to get a file path passing in a blank hadoop configuration.
This manifests as {{java.io.IOException: No FileSystem for scheme: abfss}} because it doesn't
have any of the configuration in the environment.

The problematic line is

[https://github.com/apache/incubator-hudi/blob/2bb0c21a3dd29687e49d362ed34f050380ff47ae/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L96]

 
{code:java}
 Stacktrace
 java.io.IOException: No FileSystem for scheme: abfss
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
 at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:96)
 at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:349){code}
 

  was:
Hi,
 I'm trying to use hudi to write to one of the Azure storage container file systems, ADLS
Gen 2 (abfs://). ABFS:// is one of the whitelisted file schemes. The issue I'm facing is that
in {{HoodieROTablePathFilter}} it tries to get a file path passing in a blank hadoop configuration.
This manifests as {{java.io.IOException: No FileSystem for scheme: abfss}} because it doesn't
have any of the configuration in the environment.

The problematic line is

[https://github.com/apache/incubator-hudi/blob/2bb0c21a3dd29687e49d362ed34f050380ff47ae/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L96]

 

Stacktrace
java.io.IOException: No FileSystem for scheme: abfss
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:96)
at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:349)


> RO Path filter does not pick up hadoop configs from the spark context
> ---------------------------------------------------------------------
>
>                 Key: HUDI-539
>                 URL: https://issues.apache.org/jira/browse/HUDI-539
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>          Components: Common Core
>    Affects Versions: 0.5.1
>         Environment: Spark version : 2.4.4
> Hadoop version : 2.7.3
> Databricks Runtime: 6.1
>            Reporter: Sam Somuah
>            Assignee: Vinoth Chandar
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.6.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hi,
>  I'm trying to use hudi to write to one of the Azure storage container file systems,
ADLS Gen 2 (abfs://). ABFS:// is one of the whitelisted file schemes. The issue I'm facing
is that in {{HoodieROTablePathFilter}} it tries to get a file path passing in a blank hadoop
configuration. This manifests as {{java.io.IOException: No FileSystem for scheme: abfss}}
because it doesn't have any of the configuration in the environment.
> The problematic line is
> [https://github.com/apache/incubator-hudi/blob/2bb0c21a3dd29687e49d362ed34f050380ff47ae/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L96]
>  
> {code:java}
>  Stacktrace
>  java.io.IOException: No FileSystem for scheme: abfss
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
>  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>  at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:96)
>  at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:349){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message