systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Boehm <mboe...@googlemail.com>
Subject Weighted Statistical Estimates
Date Sun, 19 Feb 2017 05:20:30 GMT
Going toward to our 1.0 release, I'd like to create consistency across our
weighted statistics. Conceptually, theses weights represent frequency
counts, i.e., multiplicities of input values.

So far, our documentation does not state any restrictions on these weights
but some runtime operations require integer data (I), while others allow
arbitrary floating point data as indicated below:

* moment
* cov
* aggregate
* table
* median (I)
* quantile (I)
* interQuartileMean (I)

This can lead to unexpected errors as shown by recent issues such as
SYSTEMML-1265. Looking back to R and its packages like Hmisc or reldist, it
turns out that they all allow arbitrary weights.

So, relaxing any restrictions of integer weights seems like the right
choice. As this changes the external behavior - albeit in a generalizing
manner - we should make this change now. If you have any concerns, let me
know.

Regards,
Matthias

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message