Mailing-List: contact dev-help@systemml.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@systemml.incubator.apache.org
MIME-Version: 1.0
References: <CAGXYKjMst6FVipQG+KL2-3UqKhx__ynbWd0++C4kYH3uOHmYmQ@mail.gmail.com>
 <e41cc125-49a7-89ee-8255-c7738e495be8@gmail.com>
In-Reply-To: <e41cc125-49a7-89ee-8255-c7738e495be8@gmail.com>
From: Mingyang Wang <miw092@eng.ucsd.edu>
Date: Fri, 05 May 2017 04:48:48 +0000
Message-ID: <CAGXYKjMTy0LTojNe1ctyn6MngmjDfRJ_tb9gf0qrN26KpUYv+g@mail.gmail.com>
Subject: Re: Sparse Matrix Storage Consumption Issue
To: dev@systemml.incubator.apache.org
Content-Type: multipart/alternative; boundary=001a11477fe4923481054ebf9b80
archived-at: Fri, 05 May 2017 04:49:11 -0000

--001a11477fe4923481054ebf9b80
Content-Type: text/plain; charset=UTF-8

Hi Matthias,

Thanks for the patch.

I have re-run the experiment and observed that there was indeed no more
memory pressure, but it still took ~90s for this simple script. I was
wondering what is the bottleneck for this case?


Total elapsed time: 94.800 sec.
Total compilation time: 1.826 sec.
Total execution time: 92.974 sec.
Number of compiled Spark inst: 2.
Number of executed Spark inst: 2.
Cache hits (Mem, WB, FS, HDFS): 1/0/0/0.
Cache writes (WB, FS, HDFS): 0/0/0.
Cache times (ACQr/m, RLS, EXP): 0.000/0.000/0.000/0.000 sec.
HOP DAGs recompiled (PRED, SB): 0/0.
HOP DAGs recompile time: 0.000 sec.
Spark ctx create time (lazy): 0.860 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
Total JIT compile time: 3.498 sec.
Total JVM GC count: 5.
Total JVM GC time: 0.064 sec.
Heavy hitter instructions (name, time, count):
-- 1) sp_uak+ 92.597 sec 1
-- 2) sp_chkpoint 0.377 sec 1
-- 3) == 0.001 sec 1
-- 4) print 0.000 sec 1
-- 5) + 0.000 sec 1
-- 6) castdts 0.000 sec 1
-- 7) createvar 0.000 sec 3
-- 8) rmvar 0.000 sec 7
-- 9) assignvar 0.000 sec 1
-- 10) cpvar 0.000 sec 1

Regards,
Mingyang

On Wed, May 3, 2017 at 8:54 AM Matthias Boehm <mboehm7@googlemail.com>
wrote:

> to summarize, this was an issue of selecting serialized representations
> for large ultra-sparse matrices. Thanks again for sharing your feedback
> with us.
>
> 1) In-memory representation: In CSR every non-zero will require 12 bytes
> - this is 240MB in your case. The overall memory consumption, however,
> depends on the distribution of non-zeros: In CSR, each block with at
> least one non-zero requires 4KB for row pointers. Assuming uniform
> distribution (the worst case), this gives us 80GB. This is likely the
> problem here. Every empty block would have an overhead of 44Bytes but
> for the worst-case assumption, there are no empty blocks left. We do not
> use COO for checkpoints because it would slow down subsequent operations.
>
> 2) Serialized/on-disk representation: For sparse datasets that are
> expected to exceed aggregate memory, we used to use a serialized
> representation (with storage level MEM_AND_DISK_SER) which uses sparse,
> ultra-sparse, or empty representations. In this form, ultra-sparse
> blocks require 9 + 16*nnz bytes and empty blocks require 9 bytes.
> Therefore, with this representation selected, you're dataset should
> easily fit in aggregate memory. Also, note that chkpoint is only a
> transformation that persists the rdd, the subsequent operation then
> pulls the data into memory.
>
> At a high-level this was a bug. We missed ultra-sparse representations
> when introducing an improvement that stores sparse matrices in MCSR
> format in CSR format on checkpoints which eliminated the need to use a
> serialized storage level. I just deliver a fix. Now we store such
> ultra-sparse matrices again in serialized form which should
> significantly reduce the memory pressure.
>
> Regards,
> Matthias
>
> On 5/3/2017 9:38 AM, Mingyang Wang wrote:
> > Hi all,
> >
> > I was playing with a super sparse matrix FK, 2e7 by 1e6, with only one
> > non-zero value on each row, that is 2e7 non-zero values in total.
> >
> > With driver memory of 1GB and executor memory of 100GB, I found the HOP
> > "Spark chkpoint", which is used to pin the FK matrix in memory, is really
> > expensive, as it invokes lots of disk operations.
> >
> > FK is stored in binary format with 24 blocks, each block is ~45MB, and
> ~1GB
> > in total.
> >
> > For example, with the script as
> >
> > """
> > FK = read($FK)
> > print("Sum of FK = " + sum(FK))
> > """
> >
> > things worked fine, and it took ~8s.
> >
> > While with the script as
> >
> > """
> > FK = read($FK)
> > if (1 == 1) {}
> > print("Sum of FK = " + sum(FK))
> > """
> >
> > things changed. It took ~92s and I observed lots of disk spills from
> logs.
> > Based on the stats from Spark UI, it seems the materialized FK requires
> >> 54GB storage and thus introduces disk operations.
> >
> > I was wondering, is this the expected behavior of a super sparse matrix?
> >
> >
> > Regards,
> > Mingyang
> >
>

--001a11477fe4923481054ebf9b80--