incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vasudevan, Ramkrishna S" <ramkrishna.s.vasude...@intel.com>
Subject RE: [VOTE] Accept Mnemonic into the Apache Incubator
Date Mon, 29 Feb 2016 18:10:52 GMT
+1 (non-binding)

Regards
Ram

-----Original Message-----
From: James Taylor [mailto:jamestaylor@apache.org] 
Sent: Monday, February 29, 2016 11:34 PM
To: general@incubator.apache.org
Subject: Re: [VOTE] Accept Mnemonic into the Apache Incubator

+1 (binding)

On Mon, Feb 29, 2016 at 10:03 AM, Phillip Rhodes <motley.crue.fan@gmail.com>
wrote:

> +1
> On Feb 29, 2016 12:57, "Henry Saputra" <henry.saputra@gmail.com> wrote:
>
> > +1 (Binding)
> >
> > - Henry
> >
> > On Mon, Feb 29, 2016 at 9:37 AM, Patrick Hunt <phunt@apache.org> wrote:
> >
> > > Hi folks,
> > >
> > > OK the discussion is now completed. Please VOTE to accept Mnemonic 
> > > into the Apache Incubator. I’ll leave the VOTE open for at least 
> > > the next 72 hours, with hopes to close it Thursday the 3rd of 
> > > March, 2016 at 10am PT.
> > > https://wiki.apache.org/incubator/MnemonicProposal
> > >
> > > [ ] +1 Accept Mnemonic as an Apache Incubator podling.
> > > [ ] +0 Abstain.
> > > [ ] -1 Don’t accept Mnemonic as an Apache Incubator podling because..
> > >
> > > Of course, I am +1 on this. Please note VOTEs from Incubator PMC 
> > > members are binding but all are welcome to VOTE!
> > >
> > > Regards,
> > >
> > > Patrick
> > >
> > > --------------------
> > > = Mnemonic Proposal =
> > > === Abstract ===
> > > Mnemonic is a Java based non-volatile memory library for in-place 
> > > structured data processing and computing. It is a solution for 
> > > generic object and block persistence on heterogeneous block and 
> > > byte-addressable devices, such as DRAM, persistent memory, NVMe, 
> > > SSD, and cloud network storage.
> > >
> > > === Proposal ===
> > > Mnemonic is a structured data persistence in-memory in-place 
> > > library for Java-based applications and frameworks. It provides 
> > > unified interfaces for data manipulation on heterogeneous 
> > > block/byte-addressable devices, such as DRAM, persistent memory, 
> > > NVMe, SSD, and cloud network devices.
> > >
> > > The design motivation for this project is to create a non-volatile 
> > > programming paradigm for in-memory data object persistence, 
> > > in-memory data objects caching, and JNI-less IPC.
> > > Mnemonic simplifies the usage of data object caching, persistence, 
> > > and JNI-less IPC for massive object oriented structural datasets.
> > >
> > > Mnemonic defines Non-Volatile Java objects that store data fields 
> > > in persistent memory and storage. During the program runtime, only 
> > > methods and volatile fields are instantiated in Java heap, 
> > > Non-Volatile data fields are directly accessed via GET/SET 
> > > operation to and from persistent memory and storage. Mnemonic 
> > > avoids SerDes and significantly reduces amount of garbage in Java heap.
> > >
> > > Major features of Mnemonic:
> > > * Provides an abstract level of viewpoint to utilize heterogeneous 
> > > block/byte-addressable device as a whole (e.g., DRAM, persistent 
> > > memory, NVMe, SSD, HD, cloud network Storage).
> > >
> > > * Provides seamless support object oriented design and programming 
> > > without adding burden to transfer object data to different form.
> > >
> > > * Avoids the object data serialization/de-serialization for data 
> > > retrieval, caching and storage.
> > >
> > > * Reduces the consumption of on-heap memory and in turn to reduce 
> > > and stabilize Java Garbage Collection (GC) pauses for latency 
> > > sensitive applications.
> > >
> > > * Overcomes current limitations of Java GC to manage much larger 
> > > memory resources for massive dataset processing and computing.
> > >
> > > * Supports the migration data usage model from traditional 
> > > NVMe/SSD/HD to non-volatile memory with ease.
> > >
> > > * Uses lazy loading mechanism to avoid unnecessary memory 
> > > consumption if some data does not need to use for computing immediately.
> > >
> > > * Bypasses JNI call for the interaction between Java runtime 
> > > application and its native code.
> > >
> > > * Provides an allocation aware auto-reclaim mechanism to prevent 
> > > external memory resource leaking.
> > >
> > >
> > > === Background ===
> > > Big Data and Cloud applications increasingly require both high 
> > > throughput and low latency processing. Java-based applications 
> > > targeting the Big Data and Cloud space should be tuned for better 
> > > throughput, lower latency, and more predictable response time.
> > > Typically, there are some issues that impact BigData applications'
> > > performance and scalability:
> > >
> > > 1) The Complexity of Data Transformation/Organization: In most 
> > > cases, during data processing, applications use their own 
> > > complicated data caching mechanism for SerDes data objects, 
> > > spilling to different storage and eviction large amount of data. 
> > > Some data objects contains complex values and structure that will 
> > > make it much more difficulty for data organization. To load and 
> > > then parse/decode its datasets from storage consumes high system resource and
computation power.
> > >
> > > 2) Lack of Caching, Burst Temporary Object Creation/Destruction 
> > > Causes Frequent Long GC Pauses: Big Data computing/syntax 
> > > generates large amount of temporary objects during processing, 
> > > e.g. lambda, SerDes, copying and etc. This will trigger frequent 
> > > long Java GC pause to scan references, to update references lists, 
> > > and to copy live objects from one memory location to another blindly.
> > >
> > > 3) The Unpredictable GC Pause: For latency sensitive applications, 
> > > such as database, search engine, web query, real-time/streaming 
> > > computing, require latency/request-response under control. But 
> > > current Java GC does not provide predictable GC activities with 
> > > large on-heap memory management.
> > >
> > > 4) High JNI Invocation Cost: JNI calls are expensive, but high 
> > > performance applications usually try to leverage native code to 
> > > improve performance, however, JNI calls need to convert Java 
> > > objects into something that C/C++ can understand. In addition, 
> > > some comprehensive native code needs to communicate with Java 
> > > based application that will cause frequently JNI call along with 
> > > stack marshalling.
> > >
> > > Mnemonic project provides a solution to address above issues and 
> > > performance bottlenecks for structured data processing and computing.
> > > It also simplifies the massive data handling with much reduced GC 
> > > activity.
> > >
> > > === Rationale ===
> > > There are strong needs for a cohesive, easy-to-use non-volatile 
> > > programing model for unified heterogeneous memory resources 
> > > management and allocation. Mnemonic project provides a reusable 
> > > and flexible framework to accommodate other special type of 
> > > memory/block devices for better performance without changing client code.
> > >
> > > Most of the BigData frameworks (e.g., Apache Spark™, Apache™ 
> > > Hadoop®, Apache HBase™, Apache Flink™, Apache Kafka™, etc.) have 
> > > their own complicated memory management modules for caching and 
> > > checkpoint. Many approaches increase the complexity and are 
> > > error-prone to maintain code.
> > >
> > > We have observed heavy overheads during the operations of data 
> > > parse, SerDes, pack/unpack, code/decode for data loading, storage, 
> > > checkpoint, caching, marshal and transferring. Mnemonic provides a 
> > > generic in-memory persistence object model to address those 
> > > overheads for better performance. In addition, it manages its 
> > > in-memory persistence objects and blocks in the way that GC does, 
> > > which means their underlying memory resource is able to be 
> > > reclaimed without explicitly releasing it.
> > >
> > > Some existing Big Data applications suffer from poor Java GC 
> > > behaviors when they process their massive unstructured datasets.  
> > > Those behaviors either cause very long stop-the-world GC pauses or 
> > > take significant system resources during computing which impact 
> > > throughput and incur significant perceivable pauses for interactive analytics.
> > >
> > > There are more and more computing intensive Big Data applications 
> > > moving down to rely on JNI to offload their computing tasks to 
> > > native code which dramatically increases the cost of JNI invocation and IPC.
> > > Mnemonic provides a mechanism to communicate with native code 
> > > directly through in-place object data update to avoid complex 
> > > object data type conversion and stack marshaling. In addition, 
> > > this project can be extended to support various lockers for 
> > > threads between Java code and native code.
> > >
> > > === Initial Goals ===
> > > Our initial goal is to bring Mnemonic into the ASF and transit the 
> > > engineering and governance processes to the "Apache Way."  We 
> > > would like to enrich a collaborative development model that 
> > > closely aligns with current and future industry memory and storage technologies.
> > >
> > > Another important goal is to encourage efforts to integrate 
> > > non-volatile programming model into data centric 
> > > processing/analytics frameworks/applications, (e.g., Apache 
> > > Spark™, Apache HBase™, Apache Flink™, Apache™ Hadoop®, Apache Cassandra™,
 etc.).
> > >
> > > We expect Mnemonic project to be continuously developing new 
> > > functionalities in an open, community-driven way. We envision 
> > > accelerating innovation under ASF governance in order to meet the 
> > > requirements of a wide variety of use cases for in-memory 
> > > non-volatile and volatile data caching programming.
> > >
> > > === Current Status ===
> > > Mnemonic project is available at Intel’s internal repository and 
> > > managed by its designers and developers. It is also temporary 
> > > hosted at Github for general view 
> > > https://github.com/NonVolatileComputing/Mnemonic.git
> > >
> > > We have integrated this project for Apache Spark™ 1.5.0 and get 2X 
> > > performance improvement ratio for Spark™ MLlib k-means workload 
> > > and observed expected benefits of removing SerDes, reducing total 
> > > GC pause time by 40% from our experiments.
> > >
> > > ==== Meritocracy ====
> > > Mnemonic was originally created by Gang (Gary) Wang and Yanping 
> > > Wang in early 2015. The initial committers are the current 
> > > Mnemonic R&D team members from US, China, and India Big Data 
> > > Technologies Group at Intel. This group will form a base for much 
> > > broader community to collaborate on this code base.
> > >
> > > We intend to radically expand the initial developer and user 
> > > community by running the project in accordance with the "Apache 
> > > Way." Users and new contributors will be treated with respect and 
> > > welcomed. By participating in the community and providing quality 
> > > patches/support that move the project forward, they will earn 
> > > merit. They also will be encouraged to provide non-code 
> > > contributions (documentation, events, community management, etc.) 
> > > and will gain merit for doing so. Those with a proven support and 
> > > quality track record will be encouraged to become committers.
> > >
> > > ==== Community ====
> > > If Mnemonic is accepted for incubation, the primary initial goal 
> > > is to transit the core community towards embracing the Apache Way 
> > > of project governance. We would solicit major existing 
> > > contributors to become committers on the project from the start.
> > >
> > > ==== Core Developers ====
> > > Mnemonic core developers are all skilled software developers and 
> > > system performance engineers at Intel Corp with years of 
> > > experiences in their fields. They have contributed many code to Apache projects.
> > > There are PMCs and experienced committers have been working with 
> > > us from Apache Spark™, Apache HBase™, Apache Phoenix™, Apache™ 
> > > Hadoop® for this project's open source efforts.
> > >
> > > === Alignment ===
> > > The initial code base is targeted to data centric processing and 
> > > analyzing in general. Mnemonic has been building the connection 
> > > and integration for Apache projects and other projects.
> > >
> > > We believe Mnemonic will be evolved to become a promising project 
> > > for real-time processing, in-memory streaming analytics and more, 
> > > along with current and future new server platforms with persistent 
> > > memory as base storage devices.
> > >
> > > === Known Risks ===
> > > ==== Orphaned products ====
> > > Intel’s Big Data Technologies Group is actively working with 
> > > community on integrating this project to Big Data frameworks and applications.
> > > We are continuously adding new concepts and codes to this project 
> > > and support new usage cases and features for Apache Big Data ecosystem.
> > >
> > > The project contributors are leading contributors of Hadoop-based 
> > > technologies and have a long standing in the Hadoop community. As 
> > > we are addressing major Big Data processing performance issues, 
> > > there is minimal risk of this work becoming non-strategic and unsupported.
> > >
> > > Our contributors are confident that a larger community will be 
> > > formed within the project in a relatively short period of time.
> > >
> > > ==== Inexperience with Open Source ==== This project has long 
> > > standing experienced mentors and interested contributors from 
> > > Apache Spark™, Apache HBase™, Apache Phoenix™, Apache™ Hadoop® to

> > > help us moving through open source process. We are actively 
> > > working with experienced Apache community PMCs and committers to 
> > > improve our project and further testing.
> > >
> > > ==== Homogeneous Developers ====
> > > All initial committers and interested contributors are employed at 
> > > Intel. As an infrastructure memory project, there are wide range 
> > > of Apache projects are interested in innovative memory project to 
> > > fit large sized persistent memory and storage devices. Various 
> > > Apache projects such as Apache Spark™, Apache HBase™, Apache 
> > > Phoenix™, Apache Flink™, Apache Cassandra™ etc. can take good 
> > > advantage of this project to overcome 
> > > serialization/de-serialization, Java GC, and caching issues. We 
> > > expect a wide range of interest will be generated after we open source this
project to Apache.
> > >
> > > ==== Reliance on Salaried Developers ==== All developers are paid 
> > > by their employers to contribute to this project. We welcome all 
> > > others to contribute to this project after it is open sourced.
> > >
> > > ==== Relationships with Other Apache Product ==== Relationship 
> > > with Apache™ Arrow:
> > > Arrow's columnar data layout allows great use of CPU caches & 
> > > SIMD. It places all data that relevant to a column operation in a 
> > > compact format in memory.
> > >
> > > Mnemonic directly puts the whole business object graphs on 
> > > external heterogeneous storage media, e.g. off-heap, SSD. It is 
> > > not necessary to normalize the structures of object graphs for 
> > > caching, checkpoint or storing. It doesn’t require developers to 
> > > normalize their data object graphs. Mnemonic applications can 
> > > avoid indexing & join datasets compared to traditional approaches.
> > >
> > > Mnemonic can leverage Arrow to transparently re-layout qualified 
> > > data objects or create special containers that is able to 
> > > efficiently hold those data records in columnar form as one of 
> > > major performance optimization constructs.
> > >
> > > Mnemonic can be integrated into various Big Data and Cloud 
> > > frameworks and applications.
> > > We are currently working on several Apache projects with Mnemonic:
> > > For Apache Spark™ we are integrating Mnemonic to improve:
> > > a) Local checkpoints
> > > b) Memory management for caching
> > > c) Persistent memory datasets input
> > > d) Non-Volatile RDD operations
> > > The best use case for Apache Spark™ computing is that the input 
> > > data is stored in form of Mnemonic native storage to avoid caching 
> > > its row data for iterative processing. Moreover, Spark 
> > > applications can leverage Mnemonic to perform data transforming in 
> > > persistent or non-persistent memory without SerDes.
> > >
> > > For Apache™ Hadoop®, we are integrating HDFS Caching with Mnemonic 
> > > instead of mmap. This will take advantage of persistent memory 
> > > related features. We also plan to evaluate to integrate in 
> > > Namenode Editlog, FSImage persistent data into Mnemonic persistent memory area.
> > >
> > > For Apache HBase™, we are using Mnemonic for BucketCache and 
> > > evaluating performance improvements.
> > >
> > > We expect Mnemonic will be further developed and integrated into 
> > > many Apache BigData projects and so on, to enhance memory 
> > > management solutions for much improved performance and reliability.
> > >
> > > ==== An Excessive Fascination with the Apache Brand ==== While we 
> > > expect Apache brand helps to attract more contributors, our 
> > > interests in starting this project is based on the factors 
> > > mentioned in the Rationale section.
> > >
> > > We would like Mnemonic to become an Apache project to further 
> > > foster a healthy community of contributors and consumers in 
> > > BigData technology R&D areas. Since Mnemonic can directly benefit 
> > > many Apache projects and solves major performance problems, we 
> > > expect the Apache Software Foundation to increase interaction with the larger
community as well.
> > >
> > > === Documentation ===
> > > The documentation is currently available at Intel and will be 
> > > posted
> > > under: https://mnemonic.incubator.apache.org/docs
> > >
> > > === Initial Source ===
> > > Initial source code is temporary hosted Github for general viewing:
> > > https://github.com/NonVolatileComputing/Mnemonic.git
> > > It will be moved to Apache http://git.apache.org/ after podling.
> > >
> > > The initial Source is written in Java code (88%) and mixed with 
> > > JNI C code (11%) and shell script (1%) for underlying native 
> > > allocation libraries.
> > >
> > > === Source and Intellectual Property Submission Plan === As soon 
> > > as Mnemonic is approved to join the Incubator, the source code 
> > > will be transitioned via the Software Grant Agreement onto ASF 
> > > infrastructure and in turn made available under the Apache 
> > > License, version 2.0.
> > >
> > > === External Dependencies ===
> > > The required external dependencies are all Apache licenses or 
> > > other compatible Licenses
> > > Note: The runtime dependent licenses of Mnemonic are all declared 
> > > as Apache 2.0, the GNU licensed components are used for Mnemonic 
> > > build and deployment. The Mnemonic JNI libraries are built using 
> > > the GNU tools.
> > >
> > > maven and its plugins (http://maven.apache.org/ ) [Apache 2.0]
> > > JDK8 or OpenJDK 8 (http://java.com/) [Oracle or Openjdk JDK 
> > > License] Nvml (http://pmem.io ) [optional] [Open Source] PMalloc 
> > > (https://github.com/bigdata-memory/pmalloc ) [optional]
> [Apache
> > > 2.0]
> > >
> > > Build and test dependencies:
> > > org.testng.testng v6.8.17  (http://testng.org) [Apache 2.0] 
> > > org.flowcomputing.commons.commons-resgc v0.8.7 [Apache 2.0] 
> > > org.flowcomputing.commons.commons-primitives v.0.6.0 [Apache 2.0] 
> > > com.squareup.javapoet v1.3.1-SNAPSHOT [Apache 2.0]
> > > JDK8 or OpenJDK 8 (http://java.com/) [Oracle or Openjdk JDK 
> > > License]
> > >
> > > === Cryptography ===
> > > Project Mnemonic does not use cryptography itself, however, Hadoop 
> > > projects use standard APIs and tools for SSH and SSL communication 
> > > where necessary.
> > >
> > > === Required Resources ===
> > > We request that following resources be created for the project to 
> > > use
> > >
> > > ==== Mailing lists ====
> > > private@mnemonic.incubator.apache.org (moderated subscriptions) 
> > > commits@mnemonic.incubator.apache.org
> > > dev@mnemonic.incubator.apache.org
> > >
> > > ==== Git repository ====
> > > https://github.com/apache/incubator-mnemonic
> > >
> > > ==== Documentation ====
> > > https://mnemonic.incubator.apache.org/docs/
> > >
> > > ==== JIRA instance ====
> > > https://issues.apache.org/jira/browse/mnemonic
> > >
> > > === Initial Committers ===
> > > * Gang (Gary) Wang (gang1 dot wang at intel dot com)
> > >
> > > * Yanping Wang (yanping dot wang at intel dot com)
> > >
> > > * Uma Maheswara Rao G (umamahesh at apache dot org)
> > >
> > > * Kai Zheng (drankye at apache dot org)
> > >
> > > * Rakesh Radhakrishnan Potty  (rakeshr at apache dot org)
> > >
> > > * Sean Zhong  (seanzhong at apache dot org)
> > >
> > > * Henry Saputra  (hsaputra at apache dot org)
> > >
> > > * Hao Cheng (hao dot cheng at intel dot com)
> > >
> > > === Additional Interested Contributors ===
> > > * Debo Dutta (dedutta at cisco dot com)
> > >
> > > * Liang Chen (chenliang613 at Huawei dot com)
> > >
> > > === Affiliations ===
> > > * Gang (Gary) Wang, Intel
> > >
> > > * Yanping Wang, Intel
> > >
> > > * Uma Maheswara Rao G, Intel
> > >
> > > * Kai Zheng, Intel
> > >
> > > * Rakesh Radhakrishnan Potty, Intel
> > >
> > > * Sean Zhong, Intel
> > >
> > > * Henry Saputra, Independent
> > >
> > > * Hao Cheng, Intel
> > >
> > > === Sponsors ===
> > > ==== Champion ====
> > > Patrick Hunt
> > >
> > > ==== Nominated Mentors ====
> > > * Patrick Hunt <phunt at apache dot org> - Apache IPMC member
> > >
> > > * Andrew Purtell <apurtell at apache dot org > - Apache IPMC 
> > > member
> > >
> > > * James Taylor <jamestaylor at apache dot org> - Apache IPMC 
> > > member
> > >
> > > * Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
> > >
> > > ==== Sponsoring Entity ====
> > > Apache Incubator PMC
> > >
> > > ------------------------------------------------------------------
> > > --- To unsubscribe, e-mail: 
> > > general-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: general-help@incubator.apache.org
> > >
> > >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org
Mime
View raw message