arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan MERCIER <jonathan.merc...@cnrgh.fr>
Subject Re: How to make a parquet dataset from an input file through Random access
Date Thu, 21 Jan 2021 12:19:07 GMT
Same question but more simple to understand.

Using pyarrow and working with pieces of data by process (multi-process 
as workaround GIL limitation). What is the correct way to handle this task ?

1. each // process have to create create a list of records store them 
into a record batch and return this batch

2. each // process have to create an output and writer buffer , create a 
list of records store them into a record batch and write this record 
batch into the stream writer. The process return the corresponding buffer ?

with the answer (1) I see how to merge all of those batch but with 
solution (2) how to merge all buffer to one once each process has 
returned their buffer ?



Thanks


-- 
Jonathan MERCIER

Researcher computational biology

PhD, Jonathan MERCIER

Centre National de Recherche en Génomique Humaine (CNRGH)

Bioinformatics (LBI)

2, rue Gaston Crémieux

91057 Evry Cedex

Tel :(33) 1 60 87 34 88

Email :jonathan.mercier@cnrgh.fr <mailto:jonathan.mercier@cnrgh.fr>


Mime
View raw message