arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Pliszka <jacek.plis...@gmail.com>
Subject Re: Can I load from a parquet file only few columns ?
Date Fri, 12 Feb 2021 14:26:54 GMT
Sure - I believe you can do it even in pandas - you have columns
parameter:  pd.read_parquet('f.pq', columns=['A', 'B'])

arrow is more useful if you need to do some conversion of filtering.

BR,

Jacek

pt., 12 lut 2021 o 15:21 jonathan mercier <jonathan.mercier@cnrgh.fr>
napisał(a):
>
> Dear,
> I have a parquet files with 300 000 columns and 30 000 rows.
> If I load a such file to pandas dataframe (with pyarrow) that take
> around 100 GO of ram.
>
> As I perform a pairwise comparison between column I could load those
> data by N columns by N columns.
>
> So is it possible to load from a parquet file only few columns by their
> names ? Which will save some memory.
>
> Thanks
>
>
> --
>                 Researcher computational biology
>                 PhD, Jonathan MERCIER
>
>                 Bioinformatics (LBI)
>                 2, rue Gaston
>                 Crémieux
>                 91057 Evry Cedex
>
>
>                 Tel :(+33)1 60 87 83 44
>                 Email :jonathan.mercier@cnrgh.fr
>
>
>

Mime
View raw message