arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: How to create a Dictionary encoded string column with the Javascript interface?
Date Tue, 21 Jul 2020 03:41:21 GMT
Hi Yakov,
You might try e-mailing the dev@ mailing list to see if anyone responds
there.  I'm not sure how many javascript devs are subscribed here.

Cheers,
Micah

On Thu, Jul 16, 2020 at 6:48 PM Yakov Galka <yakov@stannum.io> wrote:

> Hi All,
>
> I have code that creates a table with string columns as follows:
>
> for(/* each column */) {
>     // ...
>     column_vectors.push(Vector.new(Data.Utf8(new Utf8(), 0, element_count,
> null_count, nullmap_buffer, offsets_buffer, data_buffer)));
> }
> const arrow_table = Table.new(column_vectors, column_names);
> const data = arrow_table.serialize('binary', false).buffer;
> const arrow_table2 = Table.from([new Uint8Array(data)]);
>
> Here offsets_buffer is a Int32Array with the offsets and data_buffer is a
> Uint8Array with the strings, in accordance to the Arrow format described in
> https://arrow.apache.org/docs/format/Columnar.html.
>
> I am trying to change this to use a dictionary encoding instead. I change
> the producer of the data to return only the unique strings in data_buffer
> and offsets_buffer, and additionally produce an interned_buffer
> (Int32Array) with the indices of the strings. However I couldn't find how
> to initialize the column in Javascript.
>
> Shooting in the dark, I tried:
>
> for(/* each column */) {
>     // ...
>     const dictionary = Vector.new(Data.Utf8(new Utf8(), 0,
> offsets_buffer.length - 1, 0, 0, offsets_buffer, data_buffer));
>     column_vectors.push(Vector.new(Data.Dictionary(new Dictionary(new
> Utf8(), new Int32()), 0, element_count, null_count, nullmap_buffer, 0,
> interned_buffer, dictionary)));
> }
> // ...
>
> However, this causes the deserialization (Table.from) to fail with:
>
> TypeError: undefined has no properties
>     visitUtf8
>     visit
>     visit
>     visitMany
>     map
>     visitMany
>     _loadVectors
>     _loadDictionaryBatch
>     _readDictionaryBatch
>     open
>     open
>     from
>
> What's the correct way of creating a dictionary encoded column?
>
> Yakov Galka
> http://stannum.io/
>

Mime
View raw message