systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Kalyan <krishnakaly...@gmail.com>
Subject Re: PCA data gen script, no output
Date Sun, 17 Sep 2017 00:28:20 GMT
Thanks Matthias,
The script works when provided with the absolute path.

Regards,
Krishna

On Sat, Sep 16, 2017 at 12:33 PM, Matthias Boehm <mboehm7@googlemail.com>
wrote:

> ok great - I also did some more debugging: (1) when the filename is
> specified inside the dml script, the file is indeed written to a directory
> ~ in HDFS, but (2) when passed from command line the ~ is of course
> resolved before it's even passed into SystemML.
>
> So let's do the following: (1) use a small scenario of say 10K x 1K, (2)
> run it with absolute file name, and see what happens. If this does not
> work, I would suspect some permission issues next - maybe the bridge from
> python to the jvm hides some error output. If this is also not the case,
> please provide the -explain output and I have a closer look.
>
> Regards,
> Matthias
>
> On Fri, Sep 15, 2017 at 11:52 PM, Krishna Kalyan <krishnakalyan3@gmail.com
> >
> wrote:
>
> > Thank you so much for trying this Matthias. I will try this again with
> > absolute path.
> >
> > Regards,
> > Krishna
> >
> > On Sat, Sep 16, 2017 at 12:09 PM, Matthias Boehm <mboehm7@googlemail.com
> >
> > wrote:
> >
> > > ok, I just tried it with multiple different memory configurations (for
> > 6GB
> > > driver mem I got the same number of spark instructions as you reported)
> > and
> > > it ran just fine and produced the outputs. So please, give it a try
> > without
> > > the ~ (i.e., use an absolute or relative path).
> > >
> > > Also, even with 2GB mem, this data generation for 1M x 1K ran in about
> > 60s
> > > (including spark context creation) in my environment. Since your log
> > shows
> > > a runtime of 5000s, you might want to reduce the data size a bit.
> > >
> > > Regards,
> > > Matthias
> > >
> > > On Fri, Sep 15, 2017 at 11:08 PM, Krishna Kalyan <
> > krishnakalyan3@gmail.com
> > > >
> > > wrote:
> > >
> > > > Thanks for the reply,
> > > > I have tested with systemml-standalone.py too. I am still faced with
> > the
> > > > same problem. Currently my spark is configured to work on local fs
> > > instead
> > > > of HDFS hence I did not have a problem with the ~.
> > > >
> > > > Regards,
> > > > Krishna
> > > >
> > > >
> > > >
> > > >
> > > > On Sat, Sep 16, 2017 at 7:24 AM, Matthias Boehm <
> > mboehm7@googlemail.com>
> > > > wrote:
> > > >
> > > > > well, I don't think any HDFS fs implementation resolves '~' - so
it
> > has
> > > > > probably created a directory called '~/open-source/scripts/PCA_
> data'
> > > in
> > > > > your user path in HDFS or current directory in local FS.
> > > > >
> > > > > Regards,
> > > > > Matthias
> > > > >
> > > > > On Fri, Sep 15, 2017 at 5:47 PM, Krishna Kalyan <
> > > > krishnakalyan3@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > > I using PCA
> > > > > > <https://github.com/apache/systemml/blob/master/scripts/
> > > > > > datagen/genRandData4PCA.dml>
> > > > > > data
> > > > > > generation scripts to generate data. Unfortunately they do not
> > > produce
> > > > > any
> > > > > > output in the specified target directory.
> > > > > >
> > > > > > Command used:
> > > > > >
> > > > > > systemml/bin/systemml-spark-submit.py -f genRandData4PCA.dml
> > -nvargs
> > > > > > R=1000000 C=1000 OUT=~/open-source/scripts/PCA_data
> > > > > >
> > > > > > logs
> > > > > > https://gist.github.com/krishnakalyan3/70796b13735743886e41d
> > > 3da6b75d7
> > > > d5
> > > > > >
> > > > > > This job also does not throw any errors during exection.
> > > > > >
> > > > > > Thank you so much,
> > > > > > Krishna
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message