arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Herrera <>
Subject Re: Are Arrow, Flight and Plasma suitable for my use case?
Date Fri, 19 Mar 2021 09:20:16 GMT
Hi Matias,

If you are going to do tensor operations, then you could use the Arrow

However, I don't think the data stored in the tensor will be compressed. It
will be
orderly stored so you can share the tensors with other processes.

I hope that helps

On Fri, Mar 19, 2021 at 8:52 AM Matias Guijarro <>

> Hi !
> I recently learned about Apache Arrow, and as a preliminary study I would
> like to know if it can be a good choice for my use case, or if I have to
> look
> for another technology (or to craft something specific on my own !).
> I could not really find answers to my questions in the FAQ or reading
> articles and blogs, but I may have missed some information so I apologize
> in advance if my questions have already been answered.
> Arrow is all about storing columnar data. What can be the content of the
> elements in a column ?
> In my case, I have scalar values (numbers), 1D arrays and 2D arrays.
> The 2D arrays can be quite big (4000x4000 float 32 for example).
> So, we could imagine long tables, hundred thousands of lines, containing
> a mix of those data types.
> I wonder if Arrow stays efficient for such kind of data ? In particular,
> rows of 2D data arrays in a column may be difficult to handle with the
> same level of optimization ? (just guessing)
> Is there some compression in Arrow ? I am thinking about blosc kind of
> compression (like in the dead "bcolz" project - by the way someone already
> wondered about Arrow + Blosc:
> Another use case I have, is to be able for multiple processes on the same
> computer to access the Arrow in-memory store ; it seems to me Plasma
> does this job but I wonder about the trade-offs ?
> Thanks in advance for your advices - any help would be highly appreciated !
> Cheers,
> Matias.

View raw message