systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qifan Pu <qifan...@gmail.com>
Subject Re: distributed cholesky on systemml
Date Mon, 23 Apr 2018 00:07:11 GMT
and everything before (e.g., I generate the matrix using another DML) was
indeed run by Spark and shows up on the UI.

On Sun, Apr 22, 2018 at 5:05 PM, Qifan Pu <qifan.pu@gmail.com> wrote:

> Thanks Jeremy and Matthias. When I run the script, the cholesky or the inv
> is executed completely on the driver, and nothing shows up on Spark UI.
> Is that the expected behavior?
>
> On Sun, Apr 22, 2018 at 3:34 PM, Jeremy Nilmeier <nilmeier@us.ibm.com>
> wrote:
>
>> Yes, I also spoke with Sasha about this some time last year.  Thanks for
>> following up.
>>
>> Cheers, J
>>
>>
>> Jerome Nilmeier, PhD
>> Data Scientist and Engineer
>> IBM Spark Technology Center
>> http://www.spark.tc/
>>
>>
>>
>> ----- Original message -----
>> From: Matthias Boehm <mboehm7@gmail.com>
>> To: dev@systemml.apache.org
>> Cc: Qifan Pu <qifan.pu@gmail.com>, Jeremy Nilmeier <nilmeier@us.ibm.com>
>> Subject: Re: distributed cholesky on systemml
>> Date: Sun, Apr 22, 2018 2:41 PM
>>
>> thanks for the context Jeremy - that helps. I also had an offline
>> conversion with Sasha and he pointed me to a script that does exactly
>> that (iterative invert_lower_triangular) combined with a parfor over
>> independent blocks. We'll merge these scripts soon and I'll reach out
>> individually as necessary. Thanks everybody for now.
>>
>> Regards,
>> Matthias
>>
>> On Sun, Apr 22, 2018 at 12:40 PM, Jeremy Nilmeier <nilmeier@us.ibm.com>
>> wrote:
>> > This may be a duplicate...it was bounced from the dev list.
>> >
>> > I think that scalable triangular inverse will also have similar
>> properties,
>> > in that there is a sequential approach if it uses back substitution.
>> >
>> > For most of these algorithms (LU, Cholesky, QR), they are inherently
>> > sequential, and the focus of the work is on minimizing interprocess
>> > communication during the operations, which may explain why there was
>> only
>> > limited interest in pursuing this further.
>> >
>> > I had originally recommended that the recursive algorithms be rewritten
>> as
>> > iterative algorithms (and in fact provided an example of the LU in
>> iterative
>> > form), which would make the counting of operations more transparent, as
>> well
>> > as revealing possible parallelization points.
>> >
>> > Cheers, J
>> > Jerome Nilmeier, PhD
>> > Data Scientist and Engineer
>> > IBM Spark Technology Center
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.spar
>> k.tc_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=3mYOfURw_FSirAnoSv
>> 2pWvLSi1psso4F9RdGjEWL6yc&m=VIdNVaIRvibBlaNVAOXLKmxXf7ma-EXr
>> LWbjMd9Bmgo&s=YktpBBbqor3DKzS90Ah75BF6NBYtE4RauITF7QaL87g&e=
>>
>> >
>> >
>> >
>> > ----- Original message -----
>> > From: Matthias Boehm <mboehm7@gmail.com>
>> > To: dev@systemml.apache.org
>> > Cc: Qifan Pu <qifan.pu@gmail.com>
>> > Subject: Re: distributed cholesky on systemml
>> > Date: Sun, Apr 22, 2018 1:21 AM
>> >
>> > sure no problem - thanks again for catching this issue that was hidden
>> > for a while.
>> >
>> > Yes, the same depth-first characteristic applies to the Cholesky
>> > function as well. In contrast to U_triangular_inv, however, there are
>> > data dependencies between the blocks per level (at least in the
>> > current algorithm formulation), which means we cannot use the approach
>> > I described for U_triangular_inv.
>> >
>> > L11 = Cholesky(A11, nb)
>> > A22 = ... U_triangular_inv(t(L11))
>> > L22 = Cholesky(A22, nb)
>> >
>> > However, note that there are much fewer calls to Cholesky due to the
>> > switch to the builtin cholesky according to the given min block size.
>> > For example, in our new test for dimensions 1362 x 1362 and min size
>> > of 200, we call Cholesky 15 times but U_triangular_inv 2539 times.
>> >
>> > For sufficiently large min block size this might be ok for Cholesky,
>> > because each level also does a number of matrix multiplies that will
>> > exploit the available parallelism of your cluster. In that regard. you
>> > might want to experiment with different block sizes and driver memory
>> > budgets. If I get a chance, I will also run a number of experiments
>> > and see if we can rewrite these scripts.
>> >
>> > Regards,
>> > Matthias
>> >
>> > On Sun, Apr 22, 2018 at 12:48 AM, Qifan Pu <qifan.pu@gmail.com> wrote:
>> >> Matthias,
>> >>
>> >> Thanks so much for taking time to fix. Really appreciated it.
>> >> Does the same reasoning apply to the cholesky script? The recursive
>> >> approach
>> >> also looks inherently sequential.
>> >>
>> >> Best,
>> >> Qifan
>> >>
>> >> On Sat, Apr 21, 2018 at 11:39 PM, Matthias Boehm <mboehm7@gmail.com>
>> >> wrote:
>> >>>
>> >>> just as a quick update: this issue has now been fixed in SystemML
>> >>> master - it was essentially a missing guard for recursive functions
>> >>> when checking for unary size-preserving functions during
>> >>> inter-procedural analysis (IPA).
>> >>>
>> >>> However, while working with this recursive cholesky function I came
to
>> >>> the conclusion that it may need some rework. The current top-down,
>> >>> depth-first, approach is inherently sequential. This is partially
>> >>> unnecessary because for the used recursive function U_triangular_inv
>> >>> (which is called many more times than cholesky), blocks per level are
>> >>> independent. Therefore, we should look into a bottom-up, breadth-first
>> >>> approach to parallelize over the blocks in each level, which could be
>> >>> done via parfor at script level.
>> >>>
>> >>> Regards,
>> >>> Matthias
>> >>>
>> >>> On Sat, Apr 21, 2018 at 6:59 PM, Matthias Boehm <mboehm7@gmail.com>
>> >>> wrote:
>> >>> > thanks for catching this - I just ran a toy example and this seems
>> to
>> >>> > be a rewrite issue (there are specific right indexing rewrites
that
>> >>> > collapse U[1:k,1:k] and U[1:k,k+1:n] into a single access to U
which
>> >>> > helps for large distributed matrices). As a workaround, you can
set
>> >>> > "sysml.optlevel" to 1 (instead of default 2, where 1 disables all
>> >>> > rewrites), which worked fine for me. I'll fix this later today.
Also
>> >>> > I'll fix the naming from "Choleskey" to "Cholesky". Thanks again.
>> >>> >
>> >>> > Regards,
>> >>> > Matthias
>> >>> >
>> >>> >
>> >>> > On Sat, Apr 21, 2018 at 6:28 PM, Qifan Pu <qifan.pu@gmail.com>
>> wrote:
>> >>> >> Hi Matthias,
>> >>> >>
>> >>> >> Thanks for the fast response and detailed information. This
is
>> really
>> >>> >> helpful.
>> >>> >>
>> >>> >> I just tried to run it, and was tracing down a indexing bug
that
>> can
>> >>> >> be
>> >>> >> repeated by simply running the test script of triangle solve[1]
>> >>> >> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Invalid
>> >>> >> values
>> >>> >> for
>> >>> >> matrix indexing: [1667:3333,1:1666] must be within matrix
>> dimensions
>> >>> >> [1000,1000]
>> >>> >>
>> >>> >>
>> >>> >> Am I missing some configuration here?
>> >>> >>
>> >>> >>
>> >>> >> [1]
>> >>> >>
>> >>> >>
>> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
>> com_apache_systemml_blob_master_scripts_staging_scalable-
>> 5Flinalg_test_test-5Ftriangular-5Finv.dml&d=DwIBaQ&c=jf_
>> iaSHvJObTbx-siA1ZOg&r=3mYOfURw_FSirAnoSv2pWvLSi1psso
>> 4F9RdGjEWL6yc&m=FvqDr_AKzY5EAD_GAXIJoot0Z09NtMUt8kLS
>> hXcJxqQ&s=zIEgt74yeZzCTqvLCgV_0J8ECApG541uUlbaGMcK8bs&e=
>> >>> >>
>> >>> >>
>> >>> >> Best,
>> >>> >> Qifan
>> >>> >>
>> >>> >>
>> >>> >> On Sat, Apr 21, 2018 at 4:06 PM, Matthias Boehm <mboehm7@gmail.com
>> >
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Hi Qifan,
>> >>> >>>
>> >>> >>> thanks for your feedback. You're right, the builtin functions
>> >>> >>> cholesky, inverse, eigen, solve, svd, qr, and lu are currently
>> only
>> >>> >>> supported as single-node operations because they're still
>> implemented
>> >>> >>> via Apache commons.math.
>> >>> >>>
>> >>> >>> However, there is an experimental script for distributed
cholesky
>> [1]
>> >>> >>> which uses a recursive approach (with operations that allow
for
>> >>> >>> automatic distributed computation) for matrices larger
than a
>> >>> >>> user-defined block size. Once blocks become small enough,
we use
>> >>> >>> again
>> >>> >>> the builtin cholesky. Graduating this script would require
a
>> broader
>> >>> >>> set of experiments (and potential improvements) but it
simply did
>> not
>> >>> >>> have the highest priority so far. You might want to give
it a try
>> >>> >>> though.
>> >>> >>>
>> >>> >>> Thanks again for your feedback - we'll consider a higher
priority
>> for
>> >>> >>> these distributed operations when discussing the roadmap
for the
>> next
>> >>> >>> releases.
>> >>> >>>
>> >>> >>> [1]
>> >>> >>>
>> >>> >>>
>> >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
>> com_apache_systemml_blob_master_scripts_staging_scalable-
>> 5Flinalg_cholesky.dml&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZO
>> g&r=3mYOfURw_FSirAnoSv2pWvLSi1psso4F9RdGjEWL6yc&m=FvqDr_
>> AKzY5EAD_GAXIJoot0Z09NtMUt8kLShXcJxqQ&s=Yrj4GGcTlpZGRw34RoON_oO6-xDUti
>> IEUcO7-qIOyoc&e=
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>> Matthias
>> >>> >>>
>> >>> >>> On Sat, Apr 21, 2018 at 2:15 PM, Qifan Pu <qifan.pu@gmail.com>
>> wrote:
>> >>> >>> > Hi,
>> >>> >>> >
>> >>> >>> > I would love to do distributed cholesky on large matrix
with
>> >>> >>> > SystemML. I
>> >>> >>> > found two related jiras (SYSTEMML-1213, SYSTEMML-1163),
but
>> AFAIK,
>> >>> >>> > this
>> >>> >>> > is
>> >>> >>> > currently not implemented? I just wanted to check.
>> >>> >>> >
>> >>> >>> > Best,
>> >>> >>> > Qifan
>> >>> >>
>> >>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message