arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Athanassios I. Hatzis" <athanass...@healis.eu>
Subject Indexing, encoding, transformations and processing with PyArrow - GitHub 6284
Date Mon, 27 Jan 2020 14:55:15 GMT
Hi, recently I have started experimenting with PyArrow for the needs of my TRIADB project.
Kudos to
Wes and his team on leading one of the best open-source IT projects in data engineering. Definitely
a wise decision to continue the success story of Pandas on the right track !

At this stage I am trying to make a new release of TRIADB that will handle metadata management
and
fast ingestion of data in memory for transformations and basic query operations. 

Secondary index, dictionary encoding and adjacency lists are a core part of TRIADB project,
that is
the reason I posted the issue with Array.dictionary_encode method (
https://github.com/apache/arrow/issues/6284). Isn't my example and description
clear ? What exactly would you like me to elaborate on ?

I also noticed that there is NumPy integration and you can convert easily from NumPy to Arrow
but
the reverse direction has several limitations. For example I cannot create view for StringArray
(NotImplementedError: NumPy array view is only supported for primitive types). But string()
(utf8) 
is in the list of your primitive types. Any plans for supporting this type with NumPy soon
?

Kind regards
Athanassios



Mime
View raw message