arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: Attn: Wes, Re: Masked Arrays
Date Mon, 30 Mar 2020 15:39:26 GMT
On Mon, Mar 30, 2020 at 8:31 AM Daniel Nugent <nugend@gmail.com> wrote:
>
> Didn’t want to follow up on this on the Jira issue earlier since it's sort of tangential
to that bug and more of a usage question. You said:
>
> > I wouldn't recommend building applications based on them nowadays since the level
of support / compatibility in other projects is low.
>
> In my case, I am using them since it seemed like a straightforward representation of
my data that has nulls, the format I’m converting from has zero cost numpy representations,
and converting from an internal format into Arrow in memory structures appears zero cost (or
close to it) as well. I guess I can just provide the mask as an explicit argument, but my
original desire to use it came from being able to exploit numpy.ma.concatenate in a way that
saved some complexity in implementation.
>
> Since Arrow itself supports masking values with a bitfield, is there something intrinsic
to the notion of array masks that is not well supported? Or do you just mean the specific
numpy MaskedArray class?
>

I mean just the numpy.ma module. Not many Python computing projects
nowadays treat MaskedArray objects as first class citizens. Depending
on what you need it may or may not be a problem. pyarrow supports
ingesting from MaskedArray as a convenience, but it would not be
common in my experience for a library's APIs to return MaskedArrays.

> If this is too much of a numpy question rather than an arrow question, could you point
me to where I can read up on masked array support or maybe what the right place to ask the
numpy community about whether what I'm doing is appropriate or not.
>
> Thanks,
>
>
> -Dan Nugent

Mime
View raw message