incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phillip Rhodes <motley.crue....@gmail.com>
Subject Re: [VOTE] Accept Mnemonic into the Apache Incubator
Date Mon, 29 Feb 2016 18:03:08 GMT
+1
On Feb 29, 2016 12:57, "Henry Saputra" <henry.saputra@gmail.com> wrote:

> +1 (Binding)
>
> - Henry
>
> On Mon, Feb 29, 2016 at 9:37 AM, Patrick Hunt <phunt@apache.org> wrote:
>
> > Hi folks,
> >
> > OK the discussion is now completed. Please VOTE to accept Mnemonic
> > into the Apache Incubator. I’ll leave the VOTE open for at least
> > the next 72 hours, with hopes to close it Thursday the 3rd of
> > March, 2016 at 10am PT.
> > https://wiki.apache.org/incubator/MnemonicProposal
> >
> > [ ] +1 Accept Mnemonic as an Apache Incubator podling.
> > [ ] +0 Abstain.
> > [ ] -1 Don’t accept Mnemonic as an Apache Incubator podling because..
> >
> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> > members are binding but all are welcome to VOTE!
> >
> > Regards,
> >
> > Patrick
> >
> > --------------------
> > = Mnemonic Proposal =
> > === Abstract ===
> > Mnemonic is a Java based non-volatile memory library for in-place
> > structured data processing and computing. It is a solution for generic
> > object and block persistence on heterogeneous block and
> > byte-addressable devices, such as DRAM, persistent memory, NVMe, SSD,
> > and cloud network storage.
> >
> > === Proposal ===
> > Mnemonic is a structured data persistence in-memory in-place library
> > for Java-based applications and frameworks. It provides unified
> > interfaces for data manipulation on heterogeneous
> > block/byte-addressable devices, such as DRAM, persistent memory, NVMe,
> > SSD, and cloud network devices.
> >
> > The design motivation for this project is to create a non-volatile
> > programming paradigm for in-memory data object persistence, in-memory
> > data objects caching, and JNI-less IPC.
> > Mnemonic simplifies the usage of data object caching, persistence, and
> > JNI-less IPC for massive object oriented structural datasets.
> >
> > Mnemonic defines Non-Volatile Java objects that store data fields in
> > persistent memory and storage. During the program runtime, only
> > methods and volatile fields are instantiated in Java heap,
> > Non-Volatile data fields are directly accessed via GET/SET operation
> > to and from persistent memory and storage. Mnemonic avoids SerDes and
> > significantly reduces amount of garbage in Java heap.
> >
> > Major features of Mnemonic:
> > * Provides an abstract level of viewpoint to utilize heterogeneous
> > block/byte-addressable device as a whole (e.g., DRAM, persistent
> > memory, NVMe, SSD, HD, cloud network Storage).
> >
> > * Provides seamless support object oriented design and programming
> > without adding burden to transfer object data to different form.
> >
> > * Avoids the object data serialization/de-serialization for data
> > retrieval, caching and storage.
> >
> > * Reduces the consumption of on-heap memory and in turn to reduce and
> > stabilize Java Garbage Collection (GC) pauses for latency sensitive
> > applications.
> >
> > * Overcomes current limitations of Java GC to manage much larger
> > memory resources for massive dataset processing and computing.
> >
> > * Supports the migration data usage model from traditional NVMe/SSD/HD
> > to non-volatile memory with ease.
> >
> > * Uses lazy loading mechanism to avoid unnecessary memory consumption
> > if some data does not need to use for computing immediately.
> >
> > * Bypasses JNI call for the interaction between Java runtime
> > application and its native code.
> >
> > * Provides an allocation aware auto-reclaim mechanism to prevent
> > external memory resource leaking.
> >
> >
> > === Background ===
> > Big Data and Cloud applications increasingly require both high
> > throughput and low latency processing. Java-based applications
> > targeting the Big Data and Cloud space should be tuned for better
> > throughput, lower latency, and more predictable response time.
> > Typically, there are some issues that impact BigData applications'
> > performance and scalability:
> >
> > 1) The Complexity of Data Transformation/Organization: In most cases,
> > during data processing, applications use their own complicated data
> > caching mechanism for SerDes data objects, spilling to different
> > storage and eviction large amount of data. Some data objects contains
> > complex values and structure that will make it much more difficulty
> > for data organization. To load and then parse/decode its datasets from
> > storage consumes high system resource and computation power.
> >
> > 2) Lack of Caching, Burst Temporary Object Creation/Destruction Causes
> > Frequent Long GC Pauses: Big Data computing/syntax generates large
> > amount of temporary objects during processing, e.g. lambda, SerDes,
> > copying and etc. This will trigger frequent long Java GC pause to scan
> > references, to update references lists, and to copy live objects from
> > one memory location to another blindly.
> >
> > 3) The Unpredictable GC Pause: For latency sensitive applications,
> > such as database, search engine, web query, real-time/streaming
> > computing, require latency/request-response under control. But current
> > Java GC does not provide predictable GC activities with large on-heap
> > memory management.
> >
> > 4) High JNI Invocation Cost: JNI calls are expensive, but high
> > performance applications usually try to leverage native code to
> > improve performance, however, JNI calls need to convert Java objects
> > into something that C/C++ can understand. In addition, some
> > comprehensive native code needs to communicate with Java based
> > application that will cause frequently JNI call along with stack
> > marshalling.
> >
> > Mnemonic project provides a solution to address above issues and
> > performance bottlenecks for structured data processing and computing.
> > It also simplifies the massive data handling with much reduced GC
> > activity.
> >
> > === Rationale ===
> > There are strong needs for a cohesive, easy-to-use non-volatile
> > programing model for unified heterogeneous memory resources management
> > and allocation. Mnemonic project provides a reusable and flexible
> > framework to accommodate other special type of memory/block devices
> > for better performance without changing client code.
> >
> > Most of the BigData frameworks (e.g., Apache Spark™, Apache™ Hadoop®,
> > Apache HBase™, Apache Flink™, Apache Kafka™, etc.) have their own
> > complicated memory management modules for caching and checkpoint. Many
> > approaches increase the complexity and are error-prone to maintain
> > code.
> >
> > We have observed heavy overheads during the operations of data parse,
> > SerDes, pack/unpack, code/decode for data loading, storage,
> > checkpoint, caching, marshal and transferring. Mnemonic provides a
> > generic in-memory persistence object model to address those overheads
> > for better performance. In addition, it manages its in-memory
> > persistence objects and blocks in the way that GC does, which means
> > their underlying memory resource is able to be reclaimed without
> > explicitly releasing it.
> >
> > Some existing Big Data applications suffer from poor Java GC behaviors
> > when they process their massive unstructured datasets.  Those
> > behaviors either cause very long stop-the-world GC pauses or take
> > significant system resources during computing which impact throughput
> > and incur significant perceivable pauses for interactive analytics.
> >
> > There are more and more computing intensive Big Data applications
> > moving down to rely on JNI to offload their computing tasks to native
> > code which dramatically increases the cost of JNI invocation and IPC.
> > Mnemonic provides a mechanism to communicate with native code directly
> > through in-place object data update to avoid complex object data type
> > conversion and stack marshaling. In addition, this project can be
> > extended to support various lockers for threads between Java code and
> > native code.
> >
> > === Initial Goals ===
> > Our initial goal is to bring Mnemonic into the ASF and transit the
> > engineering and governance processes to the "Apache Way."  We would
> > like to enrich a collaborative development model that closely aligns
> > with current and future industry memory and storage technologies.
> >
> > Another important goal is to encourage efforts to integrate
> > non-volatile programming model into data centric processing/analytics
> > frameworks/applications, (e.g., Apache Spark™, Apache HBase™, Apache
> > Flink™, Apache™ Hadoop®, Apache Cassandra™,  etc.).
> >
> > We expect Mnemonic project to be continuously developing new
> > functionalities in an open, community-driven way. We envision
> > accelerating innovation under ASF governance in order to meet the
> > requirements of a wide variety of use cases for in-memory non-volatile
> > and volatile data caching programming.
> >
> > === Current Status ===
> > Mnemonic project is available at Intel’s internal repository and
> > managed by its designers and developers. It is also temporary hosted
> > at Github for general view
> > https://github.com/NonVolatileComputing/Mnemonic.git
> >
> > We have integrated this project for Apache Spark™ 1.5.0 and get 2X
> > performance improvement ratio for Spark™ MLlib k-means workload and
> > observed expected benefits of removing SerDes, reducing total GC pause
> > time by 40% from our experiments.
> >
> > ==== Meritocracy ====
> > Mnemonic was originally created by Gang (Gary) Wang and Yanping Wang
> > in early 2015. The initial committers are the current Mnemonic R&D
> > team members from US, China, and India Big Data Technologies Group at
> > Intel. This group will form a base for much broader community to
> > collaborate on this code base.
> >
> > We intend to radically expand the initial developer and user community
> > by running the project in accordance with the "Apache Way." Users and
> > new contributors will be treated with respect and welcomed. By
> > participating in the community and providing quality patches/support
> > that move the project forward, they will earn merit. They also will be
> > encouraged to provide non-code contributions (documentation, events,
> > community management, etc.) and will gain merit for doing so. Those
> > with a proven support and quality track record will be encouraged to
> > become committers.
> >
> > ==== Community ====
> > If Mnemonic is accepted for incubation, the primary initial goal is to
> > transit the core community towards embracing the Apache Way of project
> > governance. We would solicit major existing contributors to become
> > committers on the project from the start.
> >
> > ==== Core Developers ====
> > Mnemonic core developers are all skilled software developers and
> > system performance engineers at Intel Corp with years of experiences
> > in their fields. They have contributed many code to Apache projects.
> > There are PMCs and experienced committers have been working with us
> > from Apache Spark™, Apache HBase™, Apache Phoenix™, Apache™ Hadoop®
> > for this project's open source efforts.
> >
> > === Alignment ===
> > The initial code base is targeted to data centric processing and
> > analyzing in general. Mnemonic has been building the connection and
> > integration for Apache projects and other projects.
> >
> > We believe Mnemonic will be evolved to become a promising project for
> > real-time processing, in-memory streaming analytics and more, along
> > with current and future new server platforms with persistent memory as
> > base storage devices.
> >
> > === Known Risks ===
> > ==== Orphaned products ====
> > Intel’s Big Data Technologies Group is actively working with community
> > on integrating this project to Big Data frameworks and applications.
> > We are continuously adding new concepts and codes to this project and
> > support new usage cases and features for Apache Big Data ecosystem.
> >
> > The project contributors are leading contributors of Hadoop-based
> > technologies and have a long standing in the Hadoop community. As we
> > are addressing major Big Data processing performance issues, there is
> > minimal risk of this work becoming non-strategic and unsupported.
> >
> > Our contributors are confident that a larger community will be formed
> > within the project in a relatively short period of time.
> >
> > ==== Inexperience with Open Source ====
> > This project has long standing experienced mentors and interested
> > contributors from Apache Spark™, Apache HBase™, Apache Phoenix™,
> > Apache™ Hadoop® to help us moving through open source process. We are
> > actively working with experienced Apache community PMCs and committers
> > to improve our project and further testing.
> >
> > ==== Homogeneous Developers ====
> > All initial committers and interested contributors are employed at
> > Intel. As an infrastructure memory project, there are wide range of
> > Apache projects are interested in innovative memory project to fit
> > large sized persistent memory and storage devices. Various Apache
> > projects such as Apache Spark™, Apache HBase™, Apache Phoenix™, Apache
> > Flink™, Apache Cassandra™ etc. can take good advantage of this project
> > to overcome serialization/de-serialization, Java GC, and caching
> > issues. We expect a wide range of interest will be generated after we
> > open source this project to Apache.
> >
> > ==== Reliance on Salaried Developers ====
> > All developers are paid by their employers to contribute to this
> > project. We welcome all others to contribute to this project after it
> > is open sourced.
> >
> > ==== Relationships with Other Apache Product ====
> > Relationship with Apache™ Arrow:
> > Arrow's columnar data layout allows great use of CPU caches & SIMD. It
> > places all data that relevant to a column operation in a compact
> > format in memory.
> >
> > Mnemonic directly puts the whole business object graphs on external
> > heterogeneous storage media, e.g. off-heap, SSD. It is not necessary
> > to normalize the structures of object graphs for caching, checkpoint
> > or storing. It doesn’t require developers to normalize their data
> > object graphs. Mnemonic applications can avoid indexing & join
> > datasets compared to traditional approaches.
> >
> > Mnemonic can leverage Arrow to transparently re-layout qualified data
> > objects or create special containers that is able to efficiently hold
> > those data records in columnar form as one of major performance
> > optimization constructs.
> >
> > Mnemonic can be integrated into various Big Data and Cloud frameworks
> > and applications.
> > We are currently working on several Apache projects with Mnemonic:
> > For Apache Spark™ we are integrating Mnemonic to improve:
> > a) Local checkpoints
> > b) Memory management for caching
> > c) Persistent memory datasets input
> > d) Non-Volatile RDD operations
> > The best use case for Apache Spark™ computing is that the input data
> > is stored in form of Mnemonic native storage to avoid caching its row
> > data for iterative processing. Moreover, Spark applications can
> > leverage Mnemonic to perform data transforming in persistent or
> > non-persistent memory without SerDes.
> >
> > For Apache™ Hadoop®, we are integrating HDFS Caching with Mnemonic
> > instead of mmap. This will take advantage of persistent memory related
> > features. We also plan to evaluate to integrate in Namenode Editlog,
> > FSImage persistent data into Mnemonic persistent memory area.
> >
> > For Apache HBase™, we are using Mnemonic for BucketCache and
> > evaluating performance improvements.
> >
> > We expect Mnemonic will be further developed and integrated into many
> > Apache BigData projects and so on, to enhance memory management
> > solutions for much improved performance and reliability.
> >
> > ==== An Excessive Fascination with the Apache Brand ====
> > While we expect Apache brand helps to attract more contributors, our
> > interests in starting this project is based on the factors mentioned
> > in the Rationale section.
> >
> > We would like Mnemonic to become an Apache project to further foster a
> > healthy community of contributors and consumers in BigData technology
> > R&D areas. Since Mnemonic can directly benefit many Apache projects
> > and solves major performance problems, we expect the Apache Software
> > Foundation to increase interaction with the larger community as well.
> >
> > === Documentation ===
> > The documentation is currently available at Intel and will be posted
> > under: https://mnemonic.incubator.apache.org/docs
> >
> > === Initial Source ===
> > Initial source code is temporary hosted Github for general viewing:
> > https://github.com/NonVolatileComputing/Mnemonic.git
> > It will be moved to Apache http://git.apache.org/ after podling.
> >
> > The initial Source is written in Java code (88%) and mixed with JNI C
> > code (11%) and shell script (1%) for underlying native allocation
> > libraries.
> >
> > === Source and Intellectual Property Submission Plan ===
> > As soon as Mnemonic is approved to join the Incubator, the source code
> > will be transitioned via the Software Grant Agreement onto ASF
> > infrastructure and in turn made available under the Apache License,
> > version 2.0.
> >
> > === External Dependencies ===
> > The required external dependencies are all Apache licenses or other
> > compatible Licenses
> > Note: The runtime dependent licenses of Mnemonic are all declared as
> > Apache 2.0, the GNU licensed components are used for Mnemonic build
> > and deployment. The Mnemonic JNI libraries are built using the GNU
> > tools.
> >
> > maven and its plugins (http://maven.apache.org/ ) [Apache 2.0]
> > JDK8 or OpenJDK 8 (http://java.com/) [Oracle or Openjdk JDK License]
> > Nvml (http://pmem.io ) [optional] [Open Source]
> > PMalloc (https://github.com/bigdata-memory/pmalloc ) [optional] [Apache
> > 2.0]
> >
> > Build and test dependencies:
> > org.testng.testng v6.8.17  (http://testng.org) [Apache 2.0]
> > org.flowcomputing.commons.commons-resgc v0.8.7 [Apache 2.0]
> > org.flowcomputing.commons.commons-primitives v.0.6.0 [Apache 2.0]
> > com.squareup.javapoet v1.3.1-SNAPSHOT [Apache 2.0]
> > JDK8 or OpenJDK 8 (http://java.com/) [Oracle or Openjdk JDK License]
> >
> > === Cryptography ===
> > Project Mnemonic does not use cryptography itself, however, Hadoop
> > projects use standard APIs and tools for SSH and SSL communication
> > where necessary.
> >
> > === Required Resources ===
> > We request that following resources be created for the project to use
> >
> > ==== Mailing lists ====
> > private@mnemonic.incubator.apache.org (moderated subscriptions)
> > commits@mnemonic.incubator.apache.org
> > dev@mnemonic.incubator.apache.org
> >
> > ==== Git repository ====
> > https://github.com/apache/incubator-mnemonic
> >
> > ==== Documentation ====
> > https://mnemonic.incubator.apache.org/docs/
> >
> > ==== JIRA instance ====
> > https://issues.apache.org/jira/browse/mnemonic
> >
> > === Initial Committers ===
> > * Gang (Gary) Wang (gang1 dot wang at intel dot com)
> >
> > * Yanping Wang (yanping dot wang at intel dot com)
> >
> > * Uma Maheswara Rao G (umamahesh at apache dot org)
> >
> > * Kai Zheng (drankye at apache dot org)
> >
> > * Rakesh Radhakrishnan Potty  (rakeshr at apache dot org)
> >
> > * Sean Zhong  (seanzhong at apache dot org)
> >
> > * Henry Saputra  (hsaputra at apache dot org)
> >
> > * Hao Cheng (hao dot cheng at intel dot com)
> >
> > === Additional Interested Contributors ===
> > * Debo Dutta (dedutta at cisco dot com)
> >
> > * Liang Chen (chenliang613 at Huawei dot com)
> >
> > === Affiliations ===
> > * Gang (Gary) Wang, Intel
> >
> > * Yanping Wang, Intel
> >
> > * Uma Maheswara Rao G, Intel
> >
> > * Kai Zheng, Intel
> >
> > * Rakesh Radhakrishnan Potty, Intel
> >
> > * Sean Zhong, Intel
> >
> > * Henry Saputra, Independent
> >
> > * Hao Cheng, Intel
> >
> > === Sponsors ===
> > ==== Champion ====
> > Patrick Hunt
> >
> > ==== Nominated Mentors ====
> > * Patrick Hunt <phunt at apache dot org> - Apache IPMC member
> >
> > * Andrew Purtell <apurtell at apache dot org > - Apache IPMC member
> >
> > * James Taylor <jamestaylor at apache dot org> - Apache IPMC member
> >
> > * Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
> >
> > ==== Sponsoring Entity ====
> > Apache Incubator PMC
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message