systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dusenberr...@gmail.com
Subject Re: Regarding incubator systemml/breast_cancer project
Date Fri, 07 Apr 2017 00:24:35 GMT
Hi Aishwarya,

Thanks for sharing more info on the issue!

To facilitate easier usage, I've updated the preprocessing code by pulling out most of the
logic into a `breastcancer/preprocessing.py` module, leaving just the execution in the `Preprocessing.ipynb`
notebook.  There is also a `preprocess.py` script with the same contents as the notebook for
use with `spark-submit`.  The choice of the notebook or the script is just a matter of convenience,
as they both import from the same `breastcancer/preprocessing.py` package.  

As part of the updates, I've added an explicit SparkSession parameter (`spark`) to the `preprocess(...)`
function, and updated the body to use this SparkSession object rather than the older SparkContext
`sc` object.  Previously, the `preprocess(...)` function accessed the `sc` object that was
pulled in from the enclosing scope, which would work while all of the code was colocated within
the notebook, but not if the code was extracted and imported.  The explicit parameter now
allows for the code to be imported.

Can you please try again with the latest updates?  We are currently using Spark 2.x with Python
3.  If you use the notebook, the pyspark kernel should have a `spark` object available that
can be supplied to the functions (as is done now in the notebook), and if you use the `preprocess.py`
script with `spark-submit`, the `spark` object will be created explicitly by the script.

For a bit of context to others, Aishwarya initially reached out to find out if our breast
cancer project could be applied to TIFF images, rather than the SVS images we are currently
using (the answer is "yes" so long as they are "generic tiled TIFF images, according to the
OpenSlide documentation), and then followed up with Spark issues related to the preprocessing
code.  This conversation has been promptly moved to the mailing list so that others in the
community can benefit.


Thanks!

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 6, 2017, at 5:09 AM, Aishwarya Chaurasia <aishwarya2612@gmail.com> wrote:
> 
> Hey,
> 
> The object sc is already defined in pyspark and yet this name error keeps
> occurring. We are using spark 2.*
> 
> Here is the link to error that we are getting :
> https://paste.fedoraproject.org/paste/89iQODxzpNZVbSfgwocH8l5M1UNdIGYhyRLivL9gydE=

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message