arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Dumke <>
Subject Beginner Question: HW Input into Arrow RecordBatch
Date Wed, 17 Jul 2019 09:27:58 GMT
Dear all,

I'm just starting into Apache Arrow (or more like thinking about it). 
I'm also thinking about using Arrow not only inside our porcessing 
pipeline, but auf data acwuisition pipeline too. Regarding this, I have 
the following Question:

There are primarily two kinds of DAQ APIs in use here:

  * One [e.g. like int getData(unsigned char *data, size_t bufferSize)]
    takes a pointer to a preallocated buffer and fills it with data from
    DAQ hardware
  * The other [e.g. like int getData(unsigned char **data)] "returns" a
    pointer to a buffer created inside the hardware driver, filled with
    data from DAQ hardware

If I want to use Arrow to transport and handle the data coming out of 
those APIs, I would usually need to allocate an Arrow Buffer and (with a 
sweep of copy operations) parse the acquired data into it. If the 
hardware's output is an interlaced stream of samples (e.g. 16 8bit 
values from a 16-channel ADC, followed by the 16 values of the next 
sample...), that would obviously be row-oriented and i would therefore 
need to parse it manually into the Arrow buffer.

The question is now: If the data is only a one-dimensional array of 
samples (like from a single channel ADC) or the hardware offers the 
option to fill the buffer in a non-interlace / planar manner (meaning 
all samples from channle 0, followed by all samples of channel 1 and so 
on - essentially "columnar") - would there be a way to "reinterpret" 
this in-memory layout as an Arrow buffer/RecordBatch/whatever and therby 
avoid copy operations? e.g. by adding a specific "header", or, when 
using an API of the first type, by providing a pointer into a buffer 
allocated by Arrow and already prepared for the specific content layout?

I hope, my question and intention coms through clear enough. Any ideas 
would be greatly appreciated!

BTW - can anybody offer some links with Getting Started Guides, examples 
etc. how to start using Arrow (both C++ and Java)? I find myself still 
having dificulties finding the right starting point.

Many Thanks and kind regards,


Simon Dumke

Developer - CoDaC
Department Operation

Max Planck Institut for Plasmaphysics

View raw message