arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary <ga...@apache.org>
Subject RE: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow
Date Tue, 29 Mar 2016 23:38:23 GMT
Yes, I agree with you and that's great if we could brainstorm here to
collect more ideas about enabling non-volatile memory usage for Apache
Arrow through Mnemonic.

for the questions, my ideas are: 

- Right now you are using unpooled persistent memory. Does that make
sense or does chunking make more sense?

Gary: I think it could make some sense if developer knows that their
datasets are very big and they want Apache Arrow to keep most of them in
memory for intensive computing e.g. sort.
          the developer certainly can spill their Mnemonic managed
datasets into disk but this way seems a bit inefficient in some
scenarios that might depend on concrete application logic .

- What do you think is the right way to transition back and forth
between persistent and ephemeral memory? What do you think will be the
first pattern to be adopted. For example, do you think we should try to
use it as a tiered storage for sort spilling (before hitting the disk),
or should we use it for caching?
Gary: my 2 cents, the netty library looks not yet provide a elegant
switch mechanism for Arrow to use, probably we can change the logic
around "initialCapacity > directArena.chunkSize" to control which buffer
put on off-heap or managed by Mnemonic, another approach is to let
memory clustering mechanism of Mnemonic managing hybrid memory-like
spaces instead of part logics of class PooledByteBufAllocatorL.
Regarding the sorting, I think it is a typical case of random access to
the data, we should avoid spilling as much as possible.
my 2 cents, the performance could be
all in off-heap if possible > mnemonic used as cache > all in mnemonic
using NVMe/disk >  off-heap + spilling
the code simplicity would be
all in off-heap if possible >  all in mnemonic using NVMe/disk >
mnemonic used as cache >  off-heap + spilling


the reason why the mode "mnemonic used as cache + spilling" probably
unnecessary is mnemonic could provide nearly equivalent capacity of disk.

Thanks.
Gary.

-----Original Message-----

From: Jacques Nadeau [mailto:jacques@apache.org]

Sent: Tuesday, March 29, 2016 8:05 AM

To: dev@arrow.apache.org <mailto:dev@arrow.apache.org>

Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative
infra. for Apache Arrow

 

This is super cool. A couple of questions:

 

- Right now you are using unpooled persistent memory. Does that make
sense or does chunking make more sense?

- What do you think is the right way to transition back and forth
between persistent and ephemeral memory? What do you think will be the
first pattern to be adopted. For example, do you think we should try to
use it as a tiered storage for sort spilling (before hitting the disk),
or should we use it for caching?

 

I think it will be much easier to think about this in the context of a
primary or first use case. Do you have something in mind or should we
brainstorm here?

 

On Wed, Mar 23, 2016 at 7:16 PM, Gary <garyw@apache.org
<mailto:garyw@apache.org>> wrote:

 

> Hello,

> 

>    We have created a patch for Apache Arrow to leverage Apache

> incubator Mnemonic as an alternative infra. for underlying memory

> resources allocation, you can find it as below forked repo.

> 

> https://github.com/NonVolatileComputing/arrow

> 

>     By this way, Apache Arrow could take some structural benefits from

> Mnemonic project they are

> 

>     - Arrow is able to leverage larger capacity of high performance

> hybrid storage devices. e.g. high-end SSD, NVMe

> 

>     - Mnemonic provide a potential opportunity for Arrow to

> optimize/tuning its allocation algorithms as a native Arrow-oriented

> allocation services

> 

>     - The non-volatile features of  Mnemonic make it possible that

> Arrow could make its columnar in-memory data shared between different

> applications or across life-cycle of single application

> 

>     - Arrow could take advantages of coming Mnemonic features of

> memory clustering/DOG (distributed object graph) and massive native

> computing

> 

>     - Mnemonic helps to reduce the pressure of main memory utilization

> and its related system wide overheads.

> 

>    Our this patch is designed to minimize the changes for user to use

> Arrow, please check out the test cases provided by this patch for your

> reference.

> 

>    Note that, we need to put allocator services to a specified

> position (indicated by pom.xml) for Mnemonic backed Arrow related test

> cases to run because those services are required for external

> memory-like device management.

> 

>    Please give your comments and review feedback for better

> collaboration of Apache Arrow and Mnemonic, Thanks.

> 

> Best Regards.

> Gary.

> 

> 

> 



Mime
View raw message