arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yunfan" <yunfanfight...@foxmail.com>
Subject Re:RE: [EXTERNAL] How to understand and use the zero-copy between two processor?
Date Thu, 04 Jun 2020 12:27:10 GMT
I just wonder wonder what the "zero-copy" means in arrow document.
In my understanding,&nbsp; copy memory is also&nbsp;necessary for arrow streaming
messaging.


https://arrow.apache.org/&nbsp;
"It also provides computational libraries and zero-copy streaming messaging and interprocess
communication"



&nbsp;




------------------&nbsp;Original&nbsp;------------------
From:&nbsp;"Nugent, Daniel"<Daniel.Nugent@mlp.com&gt;;
Date:&nbsp;Thu, Jun 4, 2020 11:53 AM
To:&nbsp;"user@arrow.apache.org"<user@arrow.apache.org&gt;;

Subject:&nbsp;RE: [EXTERNAL] How to understand and use the zero-copy between two processor?



  
Hi,
 
&nbsp;
 
I'm not 100% sure I know exactly what you want to achieve here, unfortunately. If the message
buffers are being streamed to a shared memory backed file, then  you can't use shared memory
to continuously read them because the mmap facility provides fixed size shared memory. You
could use an out of band signal to indicate that you need to re-map the stream storage file,
I guess, but that's not really a stream. You  *could* read from the file, but that's going
to necessarily copy from the file handle, same as a pipe. If you want to use the plasma object
store, that can simplify the process of moving individual RecordBatches of a Table into shared
memory to be used  between processes. Unfortunately, the plasma store does have the limitation
that it currently cannot "adopt" shared memory in any way, so one initial copy into the store
is necessary.
 
&nbsp;
 
To go back to the shared memory + OOB communication: That well may be workable. The read cost
for the shared memory backed mapped files will be very low, so concatenating  the RecordBatches
back into a Table repeatedly may not be a serious issue as long as there aren't *too* many
RecordBatches to be processed.
 
&nbsp;
 
Even given all of that, I don't know that Spark has yet implemented their Dataframes as Arrow
array backed objects. There cannot be *true* zero copy until  that is the case amongst two
systems.
 
&nbsp;
 
I hope that helps a little.
 
&nbsp;
 
-Dan Nugent
 
&nbsp;
 
&nbsp;
 
From: yunfan <yunfanfighting@foxmail.com&gt; 
 Sent: Wednesday, June 3, 2020 10:23 PM
 To: user <user@arrow.apache.org&gt;
 Subject: [EXTERNAL] How to understand and use the zero-copy between two processor?
 
&nbsp;
  
In my understanding, I can write a file with shared-memory.&nbsp; And open this shared-memory
file in other processor. 
 
  
But it can't used in streaming mode. Any way to use the zero-copy between two processor?
 
  
I find spark also use pipe to transform arrow bytes between java and python procecssor.
 
  
&nbsp;
 
  
&nbsp;
 
 
 
 
######################################################################
 
The information contained in this communication is confidential and
 
may contain information that is privileged or exempt from disclosure
 
under applicable law. If you are not a named addressee, please notify
 
the sender immediately and delete this email from your system.
 
If you have received this communication, and are not a named
 
recipient, you are hereby notified that any dissemination,
 
distribution or copying of this communication is strictly prohibited.
######################################################################
Mime
View raw message