spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacob Eisinger <>
Subject Securing Spark's Network
Date Fri, 25 Apr 2014 15:23:01 GMT

We tried running Spark 0.9.1 stand-alone inside docker containers distributed over multiple
hosts. This is complicated due to Spark opening up ephemeral / dynamic ports for the workers
and the CLI.  To ensure our docker solution doesn't break Spark in unexpected ways and maintains
a secure cluster, I am interested in understanding more about Spark's network architecture.
I'd appreciate it if you could you point us to any documentation!

A couple specific questions:
What are these ports being used for?

Checking out the code / experiments, it looks like asynchronous communication for shuffling
around results. Anything else?How do you secure the network?

Network administrators tend to secure and monitor the network at the port level. If these
ports are dynamic and open randomly, firewalls are not easily configured and security alarms
are raised. Is there a way to limit the range easily? (We did investigate setting the kernel
parameter ip_local_reserved_ports, but this is broken [1] on some versions of Linux's cgroups.)



Jacob D. Eisinger
IBM Emerging Technologies - (512) 286-6075
View raw message