arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: Indexing, encoding, transformations and processing with PyArrow - GitHub 6284
Date Tue, 28 Jan 2020 21:34:57 GMT
On Tue, Jan 28, 2020 at 1:36 AM Athanassios I. Hatzis
<athanassios@healis.eu> wrote:
>
> On Mon, 2020-01-27 at 10:25 -0600, Wes McKinney wrote:
>
> >I asked to move this discussion here because we use the dev@ and user@
> > mailing list for discussions (this is explained in the GitHub issue
> > template https://github.com/apache/arrow/blob/master/.github/ISSUE_TEMPLATE.md)
>
> Sure, I noticed this, but then I can hardly find any reason for opening an issue at GitHub.
As a
> user I find a lot easier to open and track an issue for replies at GitHub than registering
and
> searching in email lists and in my opinion it's a lot easier and far more efficient for
other users
> too, especially newcomers, to search and find relevant answers. By the way how am I supposed
to
> search, view this user list online from a Web explorer GUI like the one at GitHub, is
there a web
> link ?

Here are the links

https://lists.apache.org/list.html?dev@arrow.apache.org
https://lists.apache.org/list.html?user@arrow.apache.org

We have the GitHub issues as a way to capture information from users
who are not yet familiar with the project.

>
> > treated as a valid floating point value in algorithms like dictionary_encode
>
> Hi Wes, I was not aware that np.nan and None are not treated equivalently thanks for
illustrating
> this with your Notebook. I can understand the logic behind this but it has serious flaws
that
> originate from SQL, implementation of Codd's relational theory.
>
> This is one of the reasons that I am promoting Associative Semiotic Hypergraph as an
alternative
> data model for processing data in queries. Associations (hyperedge set connecting n data
items) are
> the equivalent of table records but null values are excluded. Therefore in my system
dictionary
> should always be clean from missing values. Anyway as you suggest I need to maintain
some custom
> code for this.
>
> There was also the following question in my email that was not answered.
> > > I also noticed that there is NumPy integration and you can convert easily from
NumPy to Arrow
> > > but
> > > the reverse direction has several limitations. For example I cannot create
view for StringArray
> > > (NotImplementedError: NumPy array view is only supported for primitive types).
But string()
> > > (utf8)
> > > is in the list of your primitive types. Any plans for supporting this type
with NumPy soon ?
>
> Could you please suggest or point to a piece of code on how to convert arrow.StringArray
to numpy
> for further processing ? Do I have to forget the view with the to_numpy() method and
make a copy in
> order to process it, modify it in NumPy ?
>
>
> Thank you for your time
>
> Athan
>
>
>

Mime
View raw message