arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Yu <jonathan.i...@gmail.com>
Subject Creating and populating Arrow table directly?
Date Mon, 12 Oct 2020 21:20:57 GMT
Hello there,

I'm recording an a-priori known number of entries per column, and I want to
create a Table using these entries. I'm currently using numpy.empty to
pre-allocate empty arrays, then creating a Table from that via the
pyarrow.table(data={}) constructor.

It seems a bit silly to create a bunch of NumPy arrays, only to convert
them to Arrow arrays to serialize. Is there any benefit to
creating/populating pyarrow.array() objects directly, and if so, how do I
do that? Otherwise, is the recommendation to first create a DataFrame in
pandas (or a number of NumPy arrays as I'm doing currently), then convert
to a Table?

I think I want to have a way to create a fixed-size Table consisting of a
number of columns, then set the values for each column one by one (similar
to iloc/iat in pandas). Is this a sensible thing to try to do?

Best,

Jonathan

Mime
View raw message