airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Jurney <russell.jur...@gmail.com>
Subject How do I spark-submit a Spark job to a master [EMR] through an SSH tunnel?
Date Fri, 17 Jul 2020 05:41:13 GMT
For starters: I am familiar with all the parts involved and have created an
SSH connection, a tunnel from that connection and a connection to the Spark
master that doesn't use SSH (so it can't connect). I see the myriad ways to
interact with Spark in Airflow, both in contrib and the main package.

*What I can't find a single discussion about is: how do I submit a Spark
job to a Spark master through an SSH tunnel?*

SSH tunnels are done in DAGs via the hook and not as connections (seems
like a bad design decision) and so I can't find a way to actually make a
connection to the Spark master that uses a tunnel. There is no parameter in
the spark-submit operators that might use an ssh tunnel, so I am stuck.

Thanks,
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jurney@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com

Mime
View raw message