systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aishwarya Chaurasia <aishwarya2...@gmail.com>
Subject Re: Regarding incubator systemml/breast_cancer project
Date Sat, 15 Apr 2017 11:38:19 GMT
Hello sir,
Can you please elaborate more on what output we would be getting because we
tried executing the preprocess.py file using spark submit it keeps on
adding the tiles in rdd and while running the visualisation.py file it
isn't showing any output. Can you please help us out asap stating the
output we will be getting and the sequence of execution of files.
Thank you.

On 07-Apr-2017 5:54 AM, <dusenberrymw@gmail.com> wrote:

> Hi Aishwarya,
>
> Thanks for sharing more info on the issue!
>
> To facilitate easier usage, I've updated the preprocessing code by pulling
> out most of the logic into a `breastcancer/preprocessing.py` module,
> leaving just the execution in the `Preprocessing.ipynb` notebook.  There is
> also a `preprocess.py` script with the same contents as the notebook for
> use with `spark-submit`.  The choice of the notebook or the script is just
> a matter of convenience, as they both import from the same
> `breastcancer/preprocessing.py` package.
>
> As part of the updates, I've added an explicit SparkSession parameter
> (`spark`) to the `preprocess(...)` function, and updated the body to use
> this SparkSession object rather than the older SparkContext `sc` object.
> Previously, the `preprocess(...)` function accessed the `sc` object that
> was pulled in from the enclosing scope, which would work while all of the
> code was colocated within the notebook, but not if the code was extracted
> and imported.  The explicit parameter now allows for the code to be
> imported.
>
> Can you please try again with the latest updates?  We are currently using
> Spark 2.x with Python 3.  If you use the notebook, the pyspark kernel
> should have a `spark` object available that can be supplied to the
> functions (as is done now in the notebook), and if you use the
> `preprocess.py` script with `spark-submit`, the `spark` object will be
> created explicitly by the script.
>
> For a bit of context to others, Aishwarya initially reached out to find
> out if our breast cancer project could be applied to TIFF images, rather
> than the SVS images we are currently using (the answer is "yes" so long as
> they are "generic tiled TIFF images, according to the OpenSlide
> documentation), and then followed up with Spark issues related to the
> preprocessing code.  This conversation has been promptly moved to the
> mailing list so that others in the community can benefit.
>
>
> Thanks!
>
> -Mike
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Apr 6, 2017, at 5:09 AM, Aishwarya Chaurasia <aishwarya2612@gmail.com>
> wrote:
> >
> > Hey,
> >
> > The object sc is already defined in pyspark and yet this name error keeps
> > occurring. We are using spark 2.*
> >
> > Here is the link to error that we are getting :
> > https://paste.fedoraproject.org/paste/89iQODxzpNZVbSfgwocH8l5M1UNdIG
> YhyRLivL9gydE=
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message