In case it is helpful, this 2010 paper
<https://www.semanticscholar.org/paper/BetterSizeEstimationforSparseMatrixProductsAmossenCampagna/52f70406bb4d887e1b79af81a746184c5723848a>
is my favorite for estimating the nnz of a matrix product with high
confidence. The algorithm is much more involved than the simple SystemML
calculation, because it looks at samples from the actual data.
Here are some notes I have on the paper:
There is a sketch algorithm [2] that gives a (1 + ε) approximation z̃ to z
for any ε > 4 / (z^.25) in O(m)
time and with a bound on the number of I/Os. In relational algebra, this is
estimating
π_{ik}( A(i, j) ⋈ B(j, k) ). The idea behind the algorithm is finding,
for some tuning parameter
k that varies with z, the ksmallest value of a hash function h(i; k). The
larger this value is, the more
is and ks that match for a given j. Repeat this for all js. Different (i,
k)s contribute to different entries
in the result matrix and have different hash values. A sketch for this
algorithm can be incrementally
maintained via independent samples on is and ks. Computing the z̃ estimate
from the sample takes o(n)
time for large enough z. The larger z is, the smaller a sample size we
need. (Need large samples for
small z.) (Everything above assumes the zero product property and
zerosumfree).
On Tue, Jun 20, 2017 at 12:06 PM, Nakul Jindal <nakul02@gmail.com> wrote:
> Hi,
>
> There is code in systemml to “estimate” the output sparsity of a matrix
> multiplication operation between two sparse matrices.
> Is there a citation for this? If not, have we done any tests to figure out
> how good these estimates are?
> https://github.com/nakul02/systemml/blob/df8d4a63d8d09cae94b
> 6ca2634e31da554302c72/src/main/java/org/apache/sysml/
> hops/OptimizerUtils.java#L944L953
>
> Thanks,
> Nakul
>
