commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilles (JIRA)" <>
Subject [jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
Date Tue, 26 Mar 2019 12:48:00 GMT


Gilles commented on STATISTICS-7:

Some of the "Commons Math" open issues stem from design bugs (e.g. MATH-1281).

The new component allows to start from a clean slate, without the backward-compatibility constraint,
hopefully with design lessons learnt from other libraries, and from mistakes made in "Commons
Math" whose fix have been delayed indefinitely.

Indeed, the design of the {{stat}} package dates from the inception of the component, back
in 2003: In the [initial proposal|],
half of the source description pertains to statistics and from those, the random utilities
have gone into their [own component|] which I
consider as a step in the right direction, i.e. away from a huge monolithic library that proved
to be an unsustainable project, mostly because requested stability of some packages prevented
a sane evolution of others (solely because they were part of the same component!).

That said, an advantage to having "Commons Math" is that a lot of the "core" codes and unit
tests can be leveraged for making the port work relatively fast and robust, once a new design
has been put forward.  And if there are competing proposals, they can be developed in parallel,
for some time, until one seems to gather more interest.

> Stream-based Java statistical processing
> ----------------------------------------
>                 Key: STATISTICS-7
>                 URL:
>             Project: Apache Commons Statistics
>          Issue Type: New Feature
>            Reporter: Eric Barnhill
>            Priority: Major
>              Labels: GSoC2019, gsoc2019, statistics, streams
> The new component aims to be a library of commons statistics functions synchronized
with the latest developments in the Java language, in particular Java's functional programming
> The library will make commonly used statistical functions available to an end user through
a simple grammar comparable to commons-math-statistics or scikit-learn, while under the hood
will implement Java's mapping, streaming, and other producer and consumer functions to ensure
the statistical methods run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate Java programming,
functional programming, algorithm design, and data science skills and receive authorship on a
commons project that is likely to be widely used.
> The ideal contributor will also be able to help with important architectural decision
making. The old source of these libraries, commons-math, grew too large, hierarchically complex
and interdependent for the commons mission. The developers on this project need to make architectural
choices that will enable the statiscal code to be lightweight and reusable, with a minimum
of outside dependencies while avoiding redundancy.

This message was sent by Atlassian JIRA

View raw message