arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: [Python] Read dictionary values of DictionaryArray without reading the whole file
Date Thu, 03 Jun 2021 21:58:38 GMT
It isn't possible with the current API, but all of the library
machinery exists for you to be able to obtain this without
extraordinary pain (speaking as one of the people who participated in
the direct-read/write of arrow::DictionaryArray implementation). You
would need to do some work on the C++ library to externalize just the
dictionary data page.

On Thu, Jun 3, 2021 at 2:55 PM Juan Galvez <juan@bodo.ai> wrote:
>
> Hello,
>
> I have a large parquet file written by pandas with categorical columns (which are read
into Arrow as DictionaryArray). I want to get the value of the categories in Python (called
"dictionary" values in Arrow) without having to read any other data from the file into memory
other than metadata. Is this possible?
>
> Thank you,
> -Juan
>

Mime
View raw message