arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max Grossman <jma...@gmail.com>
Subject initialization on first call to pyarrow.array()?
Date Thu, 27 Aug 2020 23:10:34 GMT
Hi all,

Say I've got a simple program like the following that converts a numpy 
array to a pyarrow array several times in a row, and times each of those 
conversions:

    import pyarrow
    import numpy as np
    import time

    arr = np.random.rand(1)

    t1 = time.time()
    pyarrow.array(arr)
    t2 = time.time()
    pyarrow.array(arr)
    t3 = time.time()
    pyarrow.array(arr)
    t4 = time.time()
    pyarrow.array(arr)
    t5 = time.time()

I'm noticing that the first call to pyarrow.array() is taking ~0.3-0.5 s 
while the rest are nearly instantaneous (1e-05s).

Does anyone know what might be causing this? My assumption is some 
one-time initialization of pyarrow on the first call to the library, in 
which case I'd like to see if there's some way to explicitly trigger 
that initialization earlier in the program. But also curious to hear if 
there is a different explanation.

Right now I'm working around this by just calling pyarrow.array([]) at 
application start up -- I realize this doesn't actually eliminate the 
added time, but it does move it out of the critical section for any 
benchmarking runs.

Thanks,

Max


Mime
View raw message