flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Griffith <JGriff...@CampusLabs.com>
Subject Re: Using Azure Blob Storage with Flink
Date Tue, 29 Aug 2017 22:41:31 GMT
Yes, hadoop-azure and azure-storage are both on the classpath. hadoop-azure is declared as
a dependency in my build.sbt file and I’m using assembly to copy all of the dependencies
into a single jar which is submitted to Flink. I suspect the wasb format needs to be explicitly
registered with Hadoop. I think that’s accomplished by inserting the following into core-site.xml
(I’m not that familiar with Hadoop):


<property>
  <name>fs.AbstractFileSystem.wasb.Impl</name>
  <value>org.apache.hadoop.fs.azure.Wasb</value>
</property>

However, I’m wondering if it’s possible to achieve the same result from within the job
since it’s difficult to modify files on the task manager in our configuration.

On Aug 29, 2017, at 5:32 PM, Ted Yu <yuzhihong@gmail.com<mailto:yuzhihong@gmail.com>>
wrote:

Was hadoop-azure jar on the classpath ?

Please also see the following from https://hadoop.apache.org/docs/current/hadoop-azure/index.html<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhadoop.apache.org%2Fdocs%2Fcurrent%2Fhadoop-azure%2Findex.html&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=EWljUjSiHmqNxdf221hJkcXB%2FMce5GBiMV9KZW1D5EQ%3D&reserved=0>
:

The built jar file, named hadoop-azure.jar, also declares transitive dependencies on the additional
artifacts it requires, notably the Azure Storage SDK for Java.

On Tue, Aug 29, 2017 at 3:24 PM, Joshua Griffith <JGriffith@campuslabs.com<mailto:JGriffith@campuslabs.com>>
wrote:
I’m attempting to write to Azure Blob Storage using Flink's FileOutputFormat. I’ve included
hadoop-azure<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhadoop.apache.org%2Fdocs%2Fcurrent%2Fhadoop-azure%2Findex.html%23Configuring_Credentials&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=hiUySJWVf7DJwywWtXFu4hm3%2FUc0DKQ6LA9DvORggfM%3D&reserved=0>
within the jar I submit to Flink and configured the paths to be prefixed with wasb://{CONTAINERNAME}@{ACCOUNTNAME}.blob.core.windows.net/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblob.core.windows.net%2F&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=1dEvfsEAuAAQBNfsHMM8b1MNxI7oDdac7%2BO7DiIYZGg%3D&reserved=0>.

When the file output format initializes, I get the following error: ERROR ROOT - Run 4bfb099a-8d07-11e7-8d3a-fb4d07562cc0
failed with error: 'org.apache.flink.client.program.ProgramInvocationException: The program
execution failed: Cannot initialize task 'DataSink (/out/data)': No file system found with
scheme wasb, referenced in file URI 'wasb://blob@{ACCOUNTNAME}.blob.core.windows.net/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblob.core.windows.net%2F&data=02%7C01%7CJGriffith%40campuslabs.com%7C1f9aa8270ff44f09743808d4ef2dd769%7C809fd6c8b87647a9abe28be2888f4a55%7C0%7C0%7C636396427552061486&sdata=1dEvfsEAuAAQBNfsHMM8b1MNxI7oDdac7%2BO7DiIYZGg%3D&reserved=0>out/data’.

Can I register the format programmatically from within the job (without putting credentials
into a core-site.xml file on the task manager)? Can I still use Flink’s FileOutputFormat
or should I be using a Hadoop OutputFormat?

Thanks,

Joshua


Mime
View raw message