arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: Arrow-C++ "Zero-Copy-Append" to BinaryArray
Date Sat, 01 Aug 2020 02:27:21 GMT
Hi Simon,
Yes, I think it potentially would be a good addition to the BinaryBuilder.
I'm sure other people might have opinions on this, the best way forward
would be to open up a JIRA with a proposal for an API and send a PR (I
imagine this should be a fairly small change, so most discussion could
probably happen on the PR).

Thanks,
Micah

[1]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/builder_binary.h#L505

On Wed, Jul 29, 2020 at 5:53 AM Simon Dumke <simon.dumke@ipp.mpg.de> wrote:

> Hi Micah,
>
> very late on my part but still: Thanks for your reply! I've followed your
> suggestion and it is working as expected. I believe this functionality
> could be added to the BinaryBuilder - Would this be a sensible feature to
> add?
>
> Kind regards,
> Simon
>
> Am 19.06.2020 um 05:39 schrieb Micah Kornfield:
>
> Hi Simon,
> I don't think there is a public API for this in C++.  You would have to
> presize a values buffer to the size expected for the compressed data, have
> the compressor output directly to that buffer while recording the necessary
> offsets.  You could then construct the BinaryArray directly with these
> buffers (I would need to double check, but you might need to construct an
> intermediate ArrayData object).
>
> Hope this helps.
>
> Micah
>
>
>
> On Thursday, June 18, 2020, Simon Dumke <simon.dumke@ipp.mpg.de> wrote:
>
>> Hi all,
>>
>> I would like build RecordBatches with (besides others) a BinaryArray
>> column containing compressed data. when filling the BinaryArray, i would
>> like to allow the compresseor to immediately output into the Arrow Buffer
>> instead of allocating an output buffer and then copying the data into Arrow
>> Buffers.
>>
>> Is such an approach possible? And if so - how do I achieve this?
>>
>> I'd be thankfull for any insights!
>>
>> Best regards,
>>
>> Simon
>>
>>
> --
> Simon Dumke
>
> Entwickler - CoDaC
> Department Operation
>
> Max Planck Institut for Plasmaphysics
> Wendelsteinstrasse 1
> 17491 Greifswald, Germany
>
> Phone: +49(0)3834 88 1215
>
>

Mime
View raw message