arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nuernberger <ch...@techascent.com>
Subject Re: Per-batch dictionary example in Java
Date Mon, 27 Jul 2020 12:28:51 GMT
Hi Micah,

I had seen that page, yes, and my specific question was around delta
dictionaries:

https://arrow.apache.org/docs/format/Columnar.html#dictionary-messages

There doesn't seem to be a way to access this functionality via Java and
the above stream example contains one batch and one dictionary batch.

On Sun, Jul 26, 2020 at 10:54 PM Micah Kornfield <emkornfield@gmail.com>
wrote:

> Hi Chris,
> Have you read through the "reading and writing streaming format docs"
> [1].  If this doesn't work or you have something different in mind, some
> code samples of what you are currently doing might help.
>
> I'll add that I think the dictionary APIs in java aren't the most
> ergonomic so if you have ideas on improving them, feel free to
> propose something.
>
> Thanks,
> Micah
>
>
> [1]
> https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-streaming-format
>
> On Sat, Jul 25, 2020 at 5:49 AM Chris Nuernberger <chris@techascent.com>
> wrote:
>
>> Hello,
>>
>> Using the java API for serialization, it is not clear to me how to
>> utilize the per-batch dictionary functionality of the Arrow binary format.
>> Specifically the stream writer class expects the dictionaries to be defined
>> when it loads the schema so it isn't clear how it will handle assigning a
>> dictionary to a provider when saving a batch.
>>
>> Is there an example that clarifies this use case?
>>
>> Thanks for any input or feedback,
>>
>> Chris
>>
>

Mime
View raw message