systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niketan Pansare" <npan...@us.ibm.com>
Subject Re: [DISCUSS] Roadmap SystemML 1.0
Date Wed, 04 Jan 2017 00:17:52 GMT

Hi Matthias,

Thanks for the detailed roadmap.

+1 for all the items with few modifications.

1) APIs and Language:
* Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> Ensure Python and Scala MLContext have same API capability.

* Remove old MLContext
* Consolidate MLContext and JMLC
* Full support for Scala/Python DSLs
>> +1 for Python DSL except for push-down of loop structures and functions.


* Remove old file-based transform
* Scala/Python wrappers for all existing algorithms
* Data converters (additional formats: e.g., libsvm; performance)

2) Updated Dependencies:
* Spark 2.0 support
* Matrix block library (isolated jar)

3) Compiler/Runtime Features:
* GPU support (full compiler and runtime support)
>> Can we break this down into phases:
https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the
timeline of the phases in the JIRA.

* Compressed linear algebra v2
* Code generation (automatic operator fusion)
* Extended parfor (full spark exploitation, micro-batch support)
* Scale-up architecture (large dense blocks, numa)?

4) Tools
* Extended stats (task locality, shuffle, etc)
* Cloud resource advisor (extended resource optimizer)?

5) Algorithms
* Graduate "staging" algorithms (robustness/performance)
* Perftest: include all algorithms into automated performance tests
>> via spark-submit + via Scala/Python wrappers

* Simplify usage decision trees, random forest, mlogreg, msvm
(preprocessing, label representation, etc)
>> + command-line variable naming. For example: maxi, maxiter, etc.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:	Matthias Boehm <mboehm7@googlemail.com>
To:	dev@systemml.incubator.apache.org
Date:	01/03/2017 02:44 PM
Subject:	Re: [DISCUSS] Roadmap SystemML 1.0



Yes indeed, most of (3) and (4) can be done incrementally. For (5), some
of the changes might also modify the signature of algorithms (i.e.,
parameters and required input data) but it would help, for example with
decision trees, as users no longer need to dummy code their inputs.

Generally, I'm fine with making (3), (4), and part of (5) optional and
let the "must-have" features from (1) and (2) determine the timeline.

Regards,
Matthias

On 1/3/2017 11:27 PM, Luciano Resende wrote:
> On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <mboehm7@googlemail.com>
> wrote:
>
>> I'd like to initiate the discussion of a concrete roadmap for our next
>> release. According, to previous discussions, I'd think it's fair to say
>> that we agree on calling it SystemML 1.0. We should carefully plan this
>> release as it's an opportunity to change APIs and remove some older
>> deprecated features. I'd like to encourage not just developers but also
the
>> broader community to participate in this discussion.
>>
>> Personally, I think a target date of Q2/2017 is realistic. Let's start
>> with collecting the major features and changes that potentially affect
>> users. Here is an initial list, but please feel free to add and up- or
>> down-vote the individual items.
>>
>> 1) APIs and Language:
>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> * Remove old MLContext
>> * Consolidate MLContext and JMLC
>> * Full support for Scala/Python DSLs
>> * Remove old file-based transform
>> * Scala/Python wrappers for all existing algorithms
>> * Data converters (additional formats: e.g., libsvm; performance)
>>
>> 2) Updated Dependencies:
>> * Spark 2.0 support
>> * Matrix block library (isolated jar)
>>
>> 3) Compiler/Runtime Features:
>> * GPU support (full compiler and runtime support)
>> * Compressed linear algebra v2
>> * Code generation (automatic operator fusion)
>> * Extended parfor (full spark exploitation, micro-batch support)
>> * Scale-up architecture (large dense blocks, numa)?
>>
>> 4) Tools
>> * Extended stats (task locality, shuffle, etc)
>> * Cloud resource advisor (extended resource optimizer)?
>>
>> 5) Algorithms
>> * Graduate "staging" algorithms (robustness/performance)
>> * Perftest: include all algorithms into automated performance tests
>> * Simplify usage decision trees, random forest, mlogreg, msvm
>> (preprocessing, label representation, etc)
>>
>> Items marked with a ? can potentially be moved out to subsequent
releases.
>>
>>
>> Regards,
>> Matthias
>>
>
> My understanding is that most of the items in 1 and 2 are going to break
> backward compatibility, while the others can be done incrementally. Is
this
> assumption correct? If so, can we finish 1 and 2 and do a 1.0 release.
and
> them, continue with 3, 4, 5, etc ? as I don't think we should wait for
> 2017/Q2 to do a 1.0 release. I believe in release early, release often,
> particularly to attract new users, that can help verifying and
contributing
> to specific releases.
>
> Thoughts ?
>




Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message