systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Boehm <mboe...@googlemail.com>
Subject Re: Local versions of Linear Algebra Operators in DML
Date Mon, 24 Oct 2016 18:54:00 GMT
well, we still compute memory estimates for these operations. So I 
guess, a good compromise would be to raise a warning whenever the memory 
estimate is known to exceed the local memory budget.

Regards,
Matthias

On 10/24/2016 8:29 PM, Deron Eriksson wrote:
> Would it be acceptable for a user to receive a log warning if the user uses
> an operation that is currently only implemented for single node? My concern
> is that there is an expectation for operations to be distributed with
> SystemML, and if an operation is not currently distributed, the user needs
> to made aware of this.
>
> Thoughts?
>
> Deron
>
>
> On Mon, Oct 24, 2016 at 10:38 AM, Nakul Jindal <nakul02@gmail.com> wrote:
>
>> Hi,
>>
>> There is an initial implementation and PR.
>> https://github.com/apache/incubator-systemml/pull/273
>>
>> -Nakul
>>
>>
>>> On Oct 24, 2016, at 12:59 AM, Berthold Reinwald <reinwald@us.ibm.com>
>> wrote:
>>>
>>> Thanks, Imran. I think it is a good idea to start off with the DML-bodied
>>> function implementation. This will hold until we can have a built in
>>> implementation.
>>>
>>> We prototyped an implementation of distributed Cholesky as a DML bodied
>>> function as well. For performance optimization, as the matrix becomes
>>> "small" enough, we switched over and exploit a single node
>> implementation.
>>>
>>> Adding a new svd() built in function that initially routes to a local
>>> library is fine. I don't know whether Apache commons math has an
>>> implementation that can be re-used.
>>>
>>> I object renaming the functions or changing the externals. Eventually
>>> distributed instructions need to be added to these implementations, and
>>> there are open jiras for it.
>>>
>>> Regards,
>>> Berthold Reinwald
>>> IBM Almaden Research Center
>>> office: (408) 927 2208; T/L: 457 2208
>>> e-mail: reinwald@us.ibm.com
>>>
>>>
>>>
>>> From:   Niketan Pansare/Almaden/IBM@IBMUS
>>> To:     dev@systemml.incubator.apache.org
>>> Date:   10/21/2016 01:14 PM
>>> Subject:        Re: Local versions of Linear Algebra Operators in DML
>>>
>>>
>>>
>>> I am also comfortable with option (2) ... "with a plan to implement its
>>> distributed version"
>>>
>>> Thanks,
>>>
>>> Niketan Pansare
>>> IBM Almaden Research Center
>>> E-mail: npansar At us.ibm.com
>>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>>>
>>> Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching out
>>> before starting work on this. Actually, the introduction of these CP-
>>>
>>> From: Matthias Boehm <mboehm7@googlemail.com>
>>> To: dev@systemml.incubator.apache.org
>>> Date: 10/21/2016 01:00 PM
>>> Subject: Re: Local versions of Linear Algebra Operators in DML
>>>
>>>
>>>
>>> thanks Nakul for reaching out before starting work on this. Actually,
>>> the introduction of these CP-only builtin functions was a big mistake
>>> because (as you already mentioned) they mistakenly suggest that we
>>> provide distributed operations for them too. The intend was to support
>>> them in later versions with our own local and distributed
>>> implementations. So far, this had low priority though because these
>>> O(n^3) operations are seldom used over large data. However, a while
>>> back, we lost potential users who were specifically interested in
>>> distributed eigen - so there are still use cases.
>>>
>>> Despite the good intentions behind the renaming, I would strongly argue
>>> against it. First, it would unnecessarily lose compatibility with R
>>> syntax. Second, it would defeat our clean abstraction by exposing
>>> explicit local operations.
>>>
>>> This leaves us with two options here: (1) you could use an external
>>> (java-implemented) function, which gives you virtually the same runtime
>>> behavior but a clear separation via an explicit registration, or (2) add
>>> it to the list of CP-only operations (with a plan to implement its
>>> distributed version) but name it 'svd' as in R.
>>>
>>>
>>> Regards,
>>> Matthias
>>>
>>>
>>>> On 10/21/2016 9:34 PM, Nakul Jindal wrote:
>>>> Hi,
>>>>
>>>> Imran was planning on implementing a distributed SVD as a DML bodied
>>>> function.
>>>> The algorithm is described in the paper titled "A Distributed and
>>>> Incremental SVD Algorithm for Agglomerative Data Analysis on Large
>>>> Networks" available at https://arxiv.org/abs/1601.07010.
>>>>
>>>> This algorithm requires the availability of a local SVD function, which
>>> we
>>>> currently do not have in SystemML.
>>>> Seeing as how there are other linear algebra functions (eigen, lu, qr,
>>>> cholesky) in DML that reroute to Apache Common Math and only operate in
>>>> standalone/CP mode, would it be ok to add "svd" to this set?
>>>>
>>>> Also, since these operations are local and not distributed and the
>>>> documentation doesn't make it clear that these operations wont operate
>>> in
>>>> distributed mode, would it make sense to rename them to "local_eigen",
>>>> "local_qr", "local_cholesky", etc?
>>>> Obviously, this change would go into the version after 0.11.
>>>>
>>>> I understand that the ideal solution to this problem is to have a
>>>> distributed version of the aforementioned linear algebra routines, but
>>> for
>>>> the time being, would it be ok to go ahead do the rename, while also
>>>> introducing a "local_svd" ?
>>>>
>>>>
>>>> Niketan, Berthold, Matthias, Sasha - Any thoughts?
>>>>
>>>> Thanks,
>>>> Nakul Jindal
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message