arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Athanassios I. Hatzis" <>
Subject Indexing, encoding, transformations and processing with PyArrow - GitHub 6284
Date Mon, 27 Jan 2020 14:55:15 GMT
Hi, recently I have started experimenting with PyArrow for the needs of my TRIADB project.
Kudos to
Wes and his team on leading one of the best open-source IT projects in data engineering. Definitely
a wise decision to continue the success story of Pandas on the right track !

At this stage I am trying to make a new release of TRIADB that will handle metadata management
fast ingestion of data in memory for transformations and basic query operations. 

Secondary index, dictionary encoding and adjacency lists are a core part of TRIADB project,
that is
the reason I posted the issue with Array.dictionary_encode method ( Isn't my example and description
clear ? What exactly would you like me to elaborate on ?

I also noticed that there is NumPy integration and you can convert easily from NumPy to Arrow
the reverse direction has several limitations. For example I cannot create view for StringArray
(NotImplementedError: NumPy array view is only supported for primitive types). But string()
is in the list of your primitive types. Any plans for supporting this type with NumPy soon

Kind regards

View raw message