spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Rosen <>
Subject Licensing for PySpark's CloudPickle Module
Date Mon, 29 Jul 2013 04:46:17 GMT
PySpark's CloudPickle library was originally developed by PiCloud ( and distributed under a non-BSD license.  I
contacted them last year and they agreed to let us bundle the CloudPickle
module under a BSD license.  Now that Spark is moving to an Apache license,
how does this impact this module?  What license will apply to future
changes to this module?  Do we need to obtain additional licensing from the
PiCloud folks?  I've attached my original correspondence with PiCloud, in
case that helps.

I ask because I'm interested in making some fixes to the cloudpickle code
and I'd like to collaborate with the PiCloud folks, if possible, since
they're more familiar with that code and may be interested in some of the
bugs that I've found.

Josh Rosen

---------- Forwarded message ----------
From: Josh Rosen <>
Date: Wed, Aug 15, 2012 at 11:47 PM
Subject: Re: Request to release the CloudPickler module as its own Python
To: Aaron Staley <>
Cc:, Matei Zaharia <>

Hi Aaron,

I'm just interested in,, and their small
dependencies.  We'll develop our own module transfer / dependency
deployment system or build on existing systems in Spark or Mesos, so I
don't need to use other code from PiCloud.

The CloudPickle module has been very useful and I appreciate your help with
the licensing.  I'll bundle and its dependencies with
PySpark and add the proper attribution in the docstring.

Thanks for your help,
Josh Rosen

On Aug 11, 2012, at 12:23 AM, Aaron Staley wrote:

Hi Josh,

How much of the functionality do you need to utilize?

If we are just talking and (and their small
dependencies; 2 functions the cloudpickler uses from util and the
xmlhandlers library used by pickledebug), we are fine with you moving that
into your own code and re-releasing it under the BSD license. Just modify
the license the license in the source code; all we ask is that you
attribute the original work to PiCloud, Inc. and provide a link to our
website in the top level comments of the modules.

If you are looking for all of the functionality relating to getting code
running on X machine to Y machine (module transfer, some of the import
hacks in adapter, etc.), that's a whole different matter. It's difficult to
pull it out of PiCloud itself as a separate package, due to it being spread
across so many modules.  Are just the picklers enough?

Aaron Staley
PiCloud, Inc.

On Thu, Aug 9, 2012 at 10:58 PM, Josh Rosen <> wrote:

> Hello,
> My name is Josh Rosen.  I'm a grad student at UC Berkeley and I'm working
> on implementing a Python API for the Spark cluster computing system (
> Like PiCloud, my application needs to serialize Python functions in order
> to execute them across multiple machines.
> I'm currently using PiCloud's CloudPickle serializer code in my prototype (
>  Serializing arbitrary Python functions is non-trivial, but PiCloud's
> serializer is very robust and easy to use; I haven't written a function
> that it can't serialize.
> I'm interested in extending the CloudPickler module to work with PyPy (
>  I am concerned that the inclusion of a modified
> CloudPickler with Spark would cause Spark to become a “work based on the
> Library” and require Spark to become LGPL-licensed, in place of its current
> BSD license.
> Would you be interested in releasing the CloudPickler module and its
> dependencies as a BSD-licensed Python package (an LGPL-license would work
> too)?
> CloudPickler has much more functionality than other Python pickling /
> serialization libraries (
> and I hope to be able to use it in Spark.
> I would be very grateful if you are able to accommodate this request.
> Sincerely,
> Josh Rosen

Aaron Staley
*PiCloud, Inc.*

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message