Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5C62F200CEB for ; Sat, 29 Jul 2017 07:38:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5AD2516C6A2; Sat, 29 Jul 2017 05:38:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4EF7716C69C for ; Sat, 29 Jul 2017 07:38:15 +0200 (CEST) Received: (qmail 62001 invoked by uid 500); 29 Jul 2017 05:38:14 -0000 Mailing-List: contact dev-help@systemml.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.apache.org Delivered-To: mailing list dev@systemml.apache.org Received: (qmail 61989 invoked by uid 99); 29 Jul 2017 05:38:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Jul 2017 05:38:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A3B2FC00FF for ; Sat, 29 Jul 2017 05:38:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.9 X-Spam-Level: * X-Spam-Status: No, score=1.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id sW441ot06V3t for ; Sat, 29 Jul 2017 05:38:01 +0000 (UTC) Received: from mail-vk0-f44.google.com (mail-vk0-f44.google.com [209.85.213.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 11D235FC1C for ; Sat, 29 Jul 2017 05:38:01 +0000 (UTC) Received: by mail-vk0-f44.google.com with SMTP id n125so21538182vke.1 for ; Fri, 28 Jul 2017 22:38:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=TbmhnvBWnAaTKOs1DM3awGXmmHelvhJqTsH5p0Ec+Z0=; b=eeWLGepSuYvhXa73dKp9ZAKGdpC2rt3S88lJjDMC71t+q7SszKTxBMNPALCpuzI61b YbIImeZCnYXmaQ8pitawfB1cUaQX1cpbemb/IRspsK+xN0m1QO3gYkBE3O64+RR3PLmg E4UVIDiq2TPYAwA8NHfVQJRabcYLrDldb8SrnsMIMi6StlwXy7FGdUCAwvL1ORt8A1US r6OfW+EntKLZPjgwrotnSlcYCciJT2jKmeHC/e8Sk5N/+mdoU6CWI0aWDtcvd78sK4xx zaVaV2pFufLxJXB4dKYHGd2ImvPozp+PN+N0VbLgffqpWHR+o6jiylF+ZgvlmpylhKCy wuFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=TbmhnvBWnAaTKOs1DM3awGXmmHelvhJqTsH5p0Ec+Z0=; b=rfyWQOuiqtDcWSOdgwTt6eUKWPbx7FK5TH6PtvFZm2db1vTNTNFOoaHmbCfheB6lZ5 S3XpygjiA8Ifioi+MHqqR4OyxOjB9euzerWdHhe+ApowOEcmSs4jvEQsH5BhjpBF2q3G hSMxtDeMXsfooskhR8j8YPpK00gLyCPbEcNW1iENhWrFcF3gSQtZLIk9Cbnlp/fINzzU Q3hl2yinkEg+RfcMYwJHNg7HM8R9yAsQ9N2YFcq9cndbqrpICicMpsSFdz0RJu2kBeJ6 VuRVjSIG1EPC7tCvWMJhuivd5yLS77PHIKbwvswFRD0Y8eHc6Q3yfEU1YuR8+qQroaNp Ru+w== X-Gm-Message-State: AIVw112aC6fsUFQW695+Q8H7A7zv+Zs5hqS9jjqvh58cNZhXZosACy5d 0Z11cOUWmBollOdjFCzHaCzn9MEGDw== X-Received: by 10.31.146.138 with SMTP id u132mr4374996vkd.166.1501306680434; Fri, 28 Jul 2017 22:38:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.77.238 with HTTP; Fri, 28 Jul 2017 22:37:20 -0700 (PDT) In-Reply-To: References: From: Janardhan Pulivarthi Date: Sat, 29 Jul 2017 11:07:20 +0530 Message-ID: Subject: Re: svd( ) implementation To: dev@systemml.apache.org, Imran Younus , Nakul Jindal Content-Type: multipart/alternative; boundary="001a1142f98c62e60005556e3390" archived-at: Sat, 29 Jul 2017 05:38:16 -0000 --001a1142f98c62e60005556e3390 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks, Imran. As per the paper, at first we perform QR decomposition of input matrix (A), from which we obtain R. And then, we compute the svd(R) using the builtin local function (??). I'll try this. Tall-skinny matrix: so, do we have problem with square matrices?. or do we have to partition the matrix into tall-skinny matrices if we have a square one?. Thanks, Janardhan On Fri, Jul 28, 2017 at 11:52 PM, Imran Younus wrote: > Just to clarify one thing. For QR based, method, you can assume that R > matrix is small enough to fit on driver memory and them perform SVD on th= e > driver. That means your actual matrix has to tall-skinny matrix. > > imran > > On Fri, Jul 28, 2017 at 11:15 AM, Imran Younus > wrote: > > > Janardhan, > > > > The papers you're referring may not be relevant. The first paper, as fa= r > > as I can tell, is about updating an existing svd decomposition as new > data > > comes in. The 3rd paper in this list is the one I used, but that method > is > > not good. > > > > There is also a method that uses QR decomposition and then calculates S= VD > > from R matrix. Please have a look at equation 1.3 in this paper: > > > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=3D10.1.1.127.115&rank= =3D1 > > > > I think this is worth trying out. The distributed QR is already > > implemented in SystemlML, so it may quick to try out. > > > > imran > > > > > > > > On Fri, Jul 28, 2017 at 10:10 AM, Janardhan Pulivarthi < > > janardhan.pulivarthi@gmail.com> wrote: > > > >> Hi Nakul & all the committers, > >> > >> Till now I am half way through the literature. But, for now a couple o= f > >> things to mention, in SVD there are three stages > >> 1. Bidiagonal reduction step > >> 2. Computation of the singular values > >> 3. Computation of the singular vectors > >> > >> of these three, The* Bidiagonal reduction* step is very expensive, so = is > >> our focus on this( when considering GPU, at times where handling with > CPU > >> is infeasible). > >> > >> About literature: > >> > >> - I took some time to go through " A Stable and Fast Algorithm for > >> Updating the Singular Value Decomposition" by "Gu & Stanley", to > >> understand > >> the numerical stability and round-off errors when we are partitioni= ng > >> the > >> matrix in this distributed algorithm. The author has assured that > each > >> component computed will be of high absolute accuracy. And also, the > >> properties that the resultant matrix support do not have any > conflicts > >> with > >> parent matrix. [pdf > >> ] > >> > >> > >> - "High performance bidiagonal reduction using the tile algorithms = on > >> homogeneous multicore clusters ", by "Ltaief et. al", this paper ha= s > >> focused on the first stage mainly and has discussed a good about ti= le > >> algorithms and their runtime implementations.(although off-topic, I > >> read > >> this just to understand.) [pdf > >> ] > >> > >> > >> - "A distributed and incremental svd algorithm for agglomerative > data > >> analysis on large networks", by "Iwen & Ong", *Please go through* t= he > >> (a). TABLE 1, TABLE 2 . (b). APPENDIX A. RAW DATA FROM NUMERICAL > >> EXPERIMENTS. [pdf ] > >> > >> Thanks, > >> > >> Janardhan > >> > >> On Wed, Jul 26, 2017 at 12:29 AM, Nakul Jindal > wrote: > >> > >> > Hi Janardhan, > >> > > >> > The images you've used as attachments haven't reached my inbox. > >> > Could you please send them to me directly, rather than through the d= ev > >> > mailing list. > >> > (Or upload it to a image hosting site like imgur and paste the links > in > >> the > >> > email) > >> > > >> > I would like to point out that my knowledge of machine learning is > >> limited. > >> > Still, how would you want to test the algorithm? > >> > > >> > > >> > Sparse matrices in SystemML (in Spark Execution Mode) > >> > Sparse matrix support in SystemML is implemented at a block level. > Each > >> > (potentially giant) matrix is stored as blocks (in Spark execution > >> mode). > >> > The matrix itself doesn't store knowledge of whether it is sparse or > >> not. > >> > Each block does. Each block of this giant matrix can be stored in > dense > >> or > >> > spare format, depending on how many non-zeroes are in that block. > >> > The sparsity of a block is recomputed every so often, and the block > >> > representation can be converted from dense to sparse and vice-versa. > >> > When operations are performed on blocks, internally, a sparse aware > >> > operation maybe performed, depending on how the blocks are stored. > >> > > >> > (One of the other contributors can clarify, if I've missed something > or > >> > have said something wrong). > >> > > >> > Given this context, can you please think about whether missing spars= e > >> > matrix support is still a problem? > >> > > >> > > >> > -Nakul > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > On Tue, Jul 25, 2017 at 11:14 AM, Janardhan Pulivarthi < > >> > janardhan.pulivarthi@gmail.com> wrote: > >> > > >> > > Hi Nakul, > >> > > > >> > > Thanks for explaining me about pros and cons of the two approaches= . > >> For > >> > > now, I have gone through the paper carefully over a couple of days > and > >> > > found the following interesting things. > >> > > > >> > > 1. This is the algorithm we implemented. > >> > > > >> > > =E2=80=8B > >> > > 2. In the above algorithm the input matrix A is approximated to > >> another > >> > > matrix B with the following relation with the error of chi(p, i) [ > as > >> > shown > >> > > in (3) ] which the author argues will be in an acceptable limit. S= o, > >> we > >> > can > >> > > go with this algorithm. > >> > > > >> > > > >> > > But, one bad thing is that author is not sure about whether the > >> algorithm > >> > > supports the sparse matrices or not. So, we may need to test it > here. > >> > > > >> > > For the time being we need to test the present algorithm implement= ed > >> by > >> > > Imran Younus again. Can you help me with the testing? > >> > > > >> > > Thanks, > >> > > Janardhan > >> > > =E2=80=8B > >> > > > >> > > On Fri, Jul 21, 2017 at 6:07 AM, Nakul Jindal > >> wrote: > >> > > > >> > >> Hi Janardhan, > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> How will GPU implementation help scale distributed SVD: > >> > >> > >> > >> Imran implemented an algorithm he found out about in the paper "A > >> > >> Distributed and Incremental SVD Algorithm for Agglomerative Data > >> > Analysis > >> > >> on Large Networks" ( > >> > >> https://github.com/apache/systemml/pull/273/files#diff-488f0 > >> > >> 6e290f7a54db2e125f7bc608971R27 > >> > >> ). > >> > >> The idea there was to build up a distributed SVD using invocation= s > of > >> > svd > >> > >> on your local machine. He tried to achieve the multi-level > >> parallelism > >> > >> through the parfor construct. > >> > >> Each local invocation of svd was done using the Apache Commons Ma= th > >> > >> library. > >> > >> If each invocation of this local svd can instead be done on a GPU= , > >> the > >> > >> overall wall time for the distributed version would be decreased. > >> > >> > >> > >> Users may not always have a GPU. In that case, the svd falls back > to > >> the > >> > >> Apache Comons Math implementation. But if they do and if we have = a > >> "svd" > >> > >> builtin function, then it would be easier to take advantage of th= e > >> GPU. > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> Problem with scalable svd in dml is due to spark backed issues, > >> > otherwise > >> > >> there is not problem scaling w/o a local svd(): > >> > >> > >> > >> There maybe spark backend issues and more may come to light and > more > >> > >> workloads are executed on SystemML. > >> > >> For any given operation - we can implement it as a DML bodied > >> function > >> > or > >> > >> a > >> > >> builtin function. > >> > >> For DML Bodied functions: > >> > >> Pros: > >> > >> - The SystemML optimizer can be applied to it > >> > >> - Distribution of SVD is then taken care of by SystemML. It will > >> > generate > >> > >> and run the spark primitives needed. > >> > >> > >> > >> Cons: > >> > >> - Implementing SVD, whether in DML or C, is a fair amount of work > >> > >> - There would not be a straightforward call to the svd gpu librar= y. > >> In > >> > >> fact, each of the linear algebra primitives would be accelerated = on > >> the > >> > >> GPU, but not the entire operation itself. This would involve many > >> more > >> > JNI > >> > >> calls. > >> > >> > >> > >> For builtin functions: > >> > >> Pros: > >> > >> - Use of GPU libraries (cuSolver) and CPU libraries (Apache Commo= ns > >> > Math) > >> > >> can be made, these are already optimized (in case of the GPU) > >> > >> - If a better SVD implementation is available via a library, that > can > >> > >> easily be plugged in. > >> > >> > >> > >> Cons: > >> > >> - Would have to come up with an algorithm to implement distribute= d > >> SVD > >> > >> with > >> > >> spark primitives > >> > >> > >> > >> Pick your battle. > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> Maybe we could try another algorithm for scalable svd() : > >> > >> > >> > >> Sure. But before you do that, it may be worth our while to > understand > >> > what > >> > >> is exactly misleading about the paper that Imran talks about. > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> -Nakul > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Thu, Jul 20, 2017 at 4:02 PM, Janardhan Pulivarthi < > >> > >> janardhan.pulivarthi@gmail.com> wrote: > >> > >> > >> > >> > Hi Nakul, > >> > >> > > >> > >> > Can you help me understand how gpu implementations help scale i= t. > >> > >> Whether > >> > >> > the user always have GPUs in use when using this fn or is it an > >> > optional > >> > >> > feature. > >> > >> > >> > >> > >> > >> > The problem with implementing the scalable svd() in dml is due = to > >> the > >> > >> spark > >> > >> > backend issues, otherwise there is no problem scaling w/o a loc= al > >> > svd() > >> > >> > function. > >> > >> > > >> > >> > May be we could try another algorithm for the scalable svd( ), = if > >> the > >> > >> > present algorithm is misleading as Imran Younus pointed out. > >> > >> > > >> > >> > Thank you, > >> > >> > Janardhan > >> > >> > > >> > >> > >> > > > >> > > > >> > > >> > > > > > --001a1142f98c62e60005556e3390--