commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mukul chand yadav (JIRA)" <>
Subject [jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
Date Mon, 08 Apr 2019 00:14:00 GMT


Mukul chand yadav commented on STATISTICS-7:

[~ericbarnhill] [~virendrasinghrp]

Myself Mukul pursuing master's in computer science majoring in ML, would love to contribute
to overhaul of {{org.apache.commons.math4}} package using Java 8's functional APIs where I
can leverage my experience of developing stream based APIs.

Based on discussion here, please let me know if I need to consider any other details before
making a formal proposal on GSOC19 portal.

> Stream-based Java statistical processing
> ----------------------------------------
>                 Key: STATISTICS-7
>                 URL:
>             Project: Apache Commons Statistics
>          Issue Type: New Feature
>            Reporter: Eric Barnhill
>            Priority: Major
>              Labels: GSoC2019, gsoc2019, statistics, streams
> The new component aims to be a library of commons statistics functions synchronized
with the latest developments in the Java language, in particular Java's functional programming
> The library will make commonly used statistical functions available to an end user through
a simple grammar comparable to commons-math-statistics or scikit-learn, while under the hood
will implement Java's mapping, streaming, and other producer and consumer functions to ensure
the statistical methods run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications we believe
these libraries will play an important function in the data engineering ecosystem. In particular,
data engineering is widely done with Java, then passed to other languages for data-scientific
analyses; however, the common availability of functionally implemented statistical mapping
and reductions in Java could prove very useful at the interface of data science and engineering,
by enabling teams to more easily perform reductions on the engineering side before handing
off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate Java programming,
functional programming, algorithm design, and data science skills and receive authorship on a
commons project that is likely to be widely used.
> The ideal contributor will also be able to help with important architectural decision
making. The old source of these libraries, commons-math, grew too large, hierarchically complex
and interdependent for the commons mission. The developers on this project need to make architectural
choices that will enable the statiscal code to be lightweight and reusable, with a minimum
of outside dependencies while avoiding redundancy.

This message was sent by Atlassian JIRA

View raw message