incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "SystemML" by lresende
Date Wed, 28 Oct 2015 04:41:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "SystemML" page has been changed by lresende:
https://wiki.apache.org/incubator/SystemML?action=diff&rev1=3&rev2=4

Comment:
Avoid confusion to clarifying Hadoop MapReduce versus just Hadoop

  
  == Abstract ==
  
- SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification
of ML algorithms and automatic generation of hybrid runtime plans ranging from single node,
in-memory computations, to distributed computations on Apache Hadoop and  Apache Spark. ML
algorithms are expressed in an R-like syntax, that includes linear algebra primitives, statistical
functions, and ML-specific constructs. This high-level language significantly increases the
productivity of data scientists as it provides (1) full flexibility in expressing custom analytics,
and (2) data independence from the underlying input formats and physical data representations.
Automatic optimization according to data characteristics such as distribution on the disk
file system, and sparsity as well as processing characteristics in the distributed environment
like number of nodes, CPU, memory per node, ensures both efficiency and scalability. 
+ SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification
of ML algorithms and automatic generation of hybrid runtime plans ranging from single node,
in-memory computations, to distributed computations on Apache Hadoop MapReduce and  Apache
Spark. ML algorithms are expressed in an R-like syntax, that includes linear algebra primitives,
statistical functions, and ML-specific constructs. This high-level language significantly
increases the productivity of data scientists as it provides (1) full flexibility in expressing
custom analytics, and (2) data independence from the underlying input formats and physical
data representations. Automatic optimization according to data characteristics such as distribution
on the disk file system, and sparsity as well as processing characteristics in the distributed
environment like number of nodes, CPU, memory per node, ensures both efficiency and scalability.

  
  == Proposal ==
  
- The goal of SystemML is to create a commercial friendly, scalable and extensible machine
learning framework for data scientists to create or extend machine learning algorithms using
a declarative syntax. The machine learning framework enables data scientists to develop algorithms
locally without the need of a distributed cluster, and scale up and scale out the execution
of these algorithms to distributed Hadoop or Spark clusters.
+ The goal of SystemML is to create a commercial friendly, scalable and extensible machine
learning framework for data scientists to create or extend machine learning algorithms using
a declarative syntax. The machine learning framework enables data scientists to develop algorithms
locally without the need of a distributed cluster, and scale up and scale out the execution
of these algorithms to distributed Apache Hadoop MapReduce or Apache Spark clusters.
  
  == Background ==
  
@@ -20, +20 @@

  
  SystemML computations can be executed in a variety of different modes. It supports single
node in-memory computations and large-scale distributed cluster computations. This allows
the user to quickly prototype new algorithms in local environments but automatically scale
to large data sizes as well without changing the algorithm implementation.
  
- Algorithms specified in DML are dynamically compiled and optimized based on data and cluster
characteristics using rule-based and cost-based optimization techniques. The optimizer automatically
generates hybrid runtime execution plans ranging from in-memory single-node execution to distributed
computations on Spark or Hadoop. This ensures both efficiency and scalability. Automatic optimization
reduces or eliminates the need to hand-tune distributed runtime execution plans and system
configurations.
+ Algorithms specified in DML are dynamically compiled and optimized based on data and cluster
characteristics using rule-based and cost-based optimization techniques. The optimizer automatically
generates hybrid runtime execution plans ranging from in-memory single-node execution to distributed
computations on Apache Spark or Apache Hadoop MapReduce. This ensures both efficiency and
scalability. Automatic optimization reduces or eliminates the need to hand-tune distributed
runtime execution plans and system configurations.
  
  == Initial Goals ==
  
@@ -28, +28 @@

  
  == Current Status ==
  
- The initial code has been developed at the IBM Almaden Research Center in California and
has recently been made available in GitHub under the Apache Software License 2.0. The project
currently supports a single node (in memory computation) as well as distributed computations
utilizing Hadoop or Spark clusters. 
+ The initial code has been developed at the IBM Almaden Research Center in California and
has recently been made available in GitHub under the Apache Software License 2.0. The project
currently supports a single node (in memory computation) as well as distributed computations
utilizing Apache Hadoop MapReduce or Apache Spark clusters. 
  
  === Meritocracy ===
  
@@ -63, +63 @@

  
  === Relationships with Other Apache Products ===
  
- Currently, SystemML integrates with Apache Hadoop and Apache Spark as underlying computational
distributed runtimes.
+ Currently, SystemML integrates with Apache Hadoop MapReduce and Apache Spark as underlying
computational distributed runtimes.
  
  === An Excessive Fascination with the Apache Brand ===
  

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message