arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jonathan mercier <jonathan.merc...@cnrgh.fr>
Subject Re: Can I load from a parquet file only few columns ?
Date Fri, 12 Feb 2021 14:45:10 GMT
Oh yes, I can do this too. 
Thanks
Now when I see parquet I think pyarrow :-)

what did you mean by conversion of filtering ?
Could you provides a little example please

Anyway

Have a goo day

Le vendredi 12 février 2021 à 15:26 +0100, Jacek Pliszka a écrit :
> Sure - I believe you can do it even in pandas - you have columns
> parameter:  pd.read_parquet('f.pq', columns=['A', 'B'])
> 
> arrow is more useful if you need to do some conversion of filtering.
> 
> BR,
> 
> Jacek
> 
> pt., 12 lut 2021 o 15:21 jonathan mercier <jonathan.mercier@cnrgh.fr>
> napisał(a):
> > 
> > Dear,
> > I have a parquet files with 300 000 columns and 30 000 rows.
> > If I load a such file to pandas dataframe (with pyarrow) that take
> > around 100 GO of ram.
> > 
> > As I perform a pairwise comparison between column I could load
> > those
> > data by N columns by N columns.
> > 
> > So is it possible to load from a parquet file only few columns by
> > their
> > names ? Which will save some memory.
> > 
> > Thanks
> > 
> > 
> > --
> >                 Researcher computational biology
> >                 PhD, Jonathan MERCIER
> > 
> >                 Bioinformatics (LBI)
> >                 2, rue Gaston
> >                 Crémieux
> >                 91057 Evry Cedex
> > 
> > 
> >                 Tel :(+33)1 60 87 83 44
> >                 Email :jonathan.mercier@cnrgh.fr
> > 
> > 
> > 

-- 
                Researcher computational biology
                PhD, Jonathan MERCIER
            
                Bioinformatics (LBI)
                2, rue Gaston
                Crémieux
                91057 Evry Cedex
            
            
                Tel :(+33)1 60 87 83 44
                Email :jonathan.mercier@cnrgh.fr
                
            


Mime
View raw message