systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qifan Pu <qifan...@gmail.com>
Subject Re: distributed cholesky on systemml
Date Mon, 23 Apr 2018 00:05:16 GMT
Thanks Jeremy and Matthias. When I run the script, the cholesky or the inv
is executed completely on the driver, and nothing shows up on Spark UI.
Is that the expected behavior?

On Sun, Apr 22, 2018 at 3:34 PM, Jeremy Nilmeier <nilmeier@us.ibm.com>
wrote:

> Yes, I also spoke with Sasha about this some time last year.  Thanks for
> following up.
>
> Cheers, J
>
>
> Jerome Nilmeier, PhD
> Data Scientist and Engineer
> IBM Spark Technology Center
> http://www.spark.tc/
>
>
>
> ----- Original message -----
> From: Matthias Boehm <mboehm7@gmail.com>
> To: dev@systemml.apache.org
> Cc: Qifan Pu <qifan.pu@gmail.com>, Jeremy Nilmeier <nilmeier@us.ibm.com>
> Subject: Re: distributed cholesky on systemml
> Date: Sun, Apr 22, 2018 2:41 PM
>
> thanks for the context Jeremy - that helps. I also had an offline
> conversion with Sasha and he pointed me to a script that does exactly
> that (iterative invert_lower_triangular) combined with a parfor over
> independent blocks. We'll merge these scripts soon and I'll reach out
> individually as necessary. Thanks everybody for now.
>
> Regards,
> Matthias
>
> On Sun, Apr 22, 2018 at 12:40 PM, Jeremy Nilmeier <nilmeier@us.ibm.com>
> wrote:
> > This may be a duplicate...it was bounced from the dev list.
> >
> > I think that scalable triangular inverse will also have similar
> properties,
> > in that there is a sequential approach if it uses back substitution.
> >
> > For most of these algorithms (LU, Cholesky, QR), they are inherently
> > sequential, and the focus of the work is on minimizing interprocess
> > communication during the operations, which may explain why there was only
> > limited interest in pursuing this further.
> >
> > I had originally recommended that the recursive algorithms be rewritten
> as
> > iterative algorithms (and in fact provided an example of the LU in
> iterative
> > form), which would make the counting of operations more transparent, as
> well
> > as revealing possible parallelization points.
> >
> > Cheers, J
> > Jerome Nilmeier, PhD
> > Data Scientist and Engineer
> > IBM Spark Technology Center
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.
> spark.tc_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=3mYOfURw_
> FSirAnoSv2pWvLSi1psso4F9RdGjEWL6yc&m=VIdNVaIRvibBlaNVAOXLKmxXf7ma-
> EXrLWbjMd9Bmgo&s=YktpBBbqor3DKzS90Ah75BF6NBYtE4RauITF7QaL87g&e=
>
> >
> >
> >
> > ----- Original message -----
> > From: Matthias Boehm <mboehm7@gmail.com>
> > To: dev@systemml.apache.org
> > Cc: Qifan Pu <qifan.pu@gmail.com>
> > Subject: Re: distributed cholesky on systemml
> > Date: Sun, Apr 22, 2018 1:21 AM
> >
> > sure no problem - thanks again for catching this issue that was hidden
> > for a while.
> >
> > Yes, the same depth-first characteristic applies to the Cholesky
> > function as well. In contrast to U_triangular_inv, however, there are
> > data dependencies between the blocks per level (at least in the
> > current algorithm formulation), which means we cannot use the approach
> > I described for U_triangular_inv.
> >
> > L11 = Cholesky(A11, nb)
> > A22 = ... U_triangular_inv(t(L11))
> > L22 = Cholesky(A22, nb)
> >
> > However, note that there are much fewer calls to Cholesky due to the
> > switch to the builtin cholesky according to the given min block size.
> > For example, in our new test for dimensions 1362 x 1362 and min size
> > of 200, we call Cholesky 15 times but U_triangular_inv 2539 times.
> >
> > For sufficiently large min block size this might be ok for Cholesky,
> > because each level also does a number of matrix multiplies that will
> > exploit the available parallelism of your cluster. In that regard. you
> > might want to experiment with different block sizes and driver memory
> > budgets. If I get a chance, I will also run a number of experiments
> > and see if we can rewrite these scripts.
> >
> > Regards,
> > Matthias
> >
> > On Sun, Apr 22, 2018 at 12:48 AM, Qifan Pu <qifan.pu@gmail.com> wrote:
> >> Matthias,
> >>
> >> Thanks so much for taking time to fix. Really appreciated it.
> >> Does the same reasoning apply to the cholesky script? The recursive
> >> approach
> >> also looks inherently sequential.
> >>
> >> Best,
> >> Qifan
> >>
> >> On Sat, Apr 21, 2018 at 11:39 PM, Matthias Boehm <mboehm7@gmail.com>
> >> wrote:
> >>>
> >>> just as a quick update: this issue has now been fixed in SystemML
> >>> master - it was essentially a missing guard for recursive functions
> >>> when checking for unary size-preserving functions during
> >>> inter-procedural analysis (IPA).
> >>>
> >>> However, while working with this recursive cholesky function I came to
> >>> the conclusion that it may need some rework. The current top-down,
> >>> depth-first, approach is inherently sequential. This is partially
> >>> unnecessary because for the used recursive function U_triangular_inv
> >>> (which is called many more times than cholesky), blocks per level are
> >>> independent. Therefore, we should look into a bottom-up, breadth-first
> >>> approach to parallelize over the blocks in each level, which could be
> >>> done via parfor at script level.
> >>>
> >>> Regards,
> >>> Matthias
> >>>
> >>> On Sat, Apr 21, 2018 at 6:59 PM, Matthias Boehm <mboehm7@gmail.com>
> >>> wrote:
> >>> > thanks for catching this - I just ran a toy example and this seems
to
> >>> > be a rewrite issue (there are specific right indexing rewrites that
> >>> > collapse U[1:k,1:k] and U[1:k,k+1:n] into a single access to U which
> >>> > helps for large distributed matrices). As a workaround, you can set
> >>> > "sysml.optlevel" to 1 (instead of default 2, where 1 disables all
> >>> > rewrites), which worked fine for me. I'll fix this later today. Also
> >>> > I'll fix the naming from "Choleskey" to "Cholesky". Thanks again.
> >>> >
> >>> > Regards,
> >>> > Matthias
> >>> >
> >>> >
> >>> > On Sat, Apr 21, 2018 at 6:28 PM, Qifan Pu <qifan.pu@gmail.com>
> wrote:
> >>> >> Hi Matthias,
> >>> >>
> >>> >> Thanks for the fast response and detailed information. This is
> really
> >>> >> helpful.
> >>> >>
> >>> >> I just tried to run it, and was tracing down a indexing bug that
can
> >>> >> be
> >>> >> repeated by simply running the test script of triangle solve[1]
> >>> >> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Invalid
> >>> >> values
> >>> >> for
> >>> >> matrix indexing: [1667:3333,1:1666] must be within matrix dimensions
> >>> >> [1000,1000]
> >>> >>
> >>> >>
> >>> >> Am I missing some configuration here?
> >>> >>
> >>> >>
> >>> >> [1]
> >>> >>
> >>> >>
> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_apache_systemml_blob_master_scripts_staging_
> scalable-5Flinalg_test_test-5Ftriangular-5Finv.dml&d=
> DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=3mYOfURw_FSirAnoSv2pWvLSi1psso4F9RdGjEW
> L6yc&m=FvqDr_AKzY5EAD_GAXIJoot0Z09NtMUt8kLShXcJxqQ&s=zIEgt74yeZzCTqvLCgV_
> 0J8ECApG541uUlbaGMcK8bs&e=
> >>> >>
> >>> >>
> >>> >> Best,
> >>> >> Qifan
> >>> >>
> >>> >>
> >>> >> On Sat, Apr 21, 2018 at 4:06 PM, Matthias Boehm <mboehm7@gmail.com>
> >>> >> wrote:
> >>> >>>
> >>> >>> Hi Qifan,
> >>> >>>
> >>> >>> thanks for your feedback. You're right, the builtin functions
> >>> >>> cholesky, inverse, eigen, solve, svd, qr, and lu are currently
only
> >>> >>> supported as single-node operations because they're still
> implemented
> >>> >>> via Apache commons.math.
> >>> >>>
> >>> >>> However, there is an experimental script for distributed cholesky
> [1]
> >>> >>> which uses a recursive approach (with operations that allow
for
> >>> >>> automatic distributed computation) for matrices larger than
a
> >>> >>> user-defined block size. Once blocks become small enough, we
use
> >>> >>> again
> >>> >>> the builtin cholesky. Graduating this script would require
a
> broader
> >>> >>> set of experiments (and potential improvements) but it simply
did
> not
> >>> >>> have the highest priority so far. You might want to give it
a try
> >>> >>> though.
> >>> >>>
> >>> >>> Thanks again for your feedback - we'll consider a higher priority
> for
> >>> >>> these distributed operations when discussing the roadmap for
the
> next
> >>> >>> releases.
> >>> >>>
> >>> >>> [1]
> >>> >>>
> >>> >>>
> >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_apache_systemml_blob_master_scripts_staging_
> scalable-5Flinalg_cholesky.dml&d=DwIBaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=3mYOfURw_FSirAnoSv2pWvLSi1psso4F9RdGjEWL6yc&m=FvqDr_AKzY5EAD_
> GAXIJoot0Z09NtMUt8kLShXcJxqQ&s=Yrj4GGcTlpZGRw34RoON_oO6-
> xDUtiIEUcO7-qIOyoc&e=
> >>> >>>
> >>> >>> Regards,
> >>> >>> Matthias
> >>> >>>
> >>> >>> On Sat, Apr 21, 2018 at 2:15 PM, Qifan Pu <qifan.pu@gmail.com>
> wrote:
> >>> >>> > Hi,
> >>> >>> >
> >>> >>> > I would love to do distributed cholesky on large matrix
with
> >>> >>> > SystemML. I
> >>> >>> > found two related jiras (SYSTEMML-1213, SYSTEMML-1163),
but
> AFAIK,
> >>> >>> > this
> >>> >>> > is
> >>> >>> > currently not implemented? I just wanted to check.
> >>> >>> >
> >>> >>> > Best,
> >>> >>> > Qifan
> >>> >>
> >>> >>
> >>
> >>
> >
> >
> >
> >
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message