flink-user-zh mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Zhang" <825875...@qq.com>
Subject How to write stream data to other Hadoop Cluster by StreamingFileSink
Date Sat, 05 Oct 2019 05:44:59 GMT

I have 2 hadoop cluster (hdfs://mycluster1 and hdfs://mycluster2),both of them configured
the HA,
I have a job ,read from streaming data from kafka, and write it to hdfs by StreamingFileSink,now
I deployed my job on mycluster1 (flink on yarn),and I want to write the data to mycluster2
, how did I add the configure ? If I write hdfs://mycluster2/tmp/abc &nbsp; on the path
of the StreamingFileSink directly, it will report that mycluster2 could not be found.

I look at the source code of org.apache.flink.runtime.fs.hdfs.HadoopFsFactory#create. When
flink loads core-site.xml and hdfs-site.xml, it is first loaded from hadoopConfig, then flinkConfig,
and finally from classpath. I see flinkConfig does not seem to be empty, and the code is loaded
by flinkConfig, finally loaded from HADOOP_HOME, so the core-site.xml and hdfs-site.xml of
mycluster1 cluster will not contain the &nbsp;information of mycluster2. Cause mycluster2
not found.

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message