Delta encoding hasn't been implemented in the C++ code that pyarrow binds to. It is supported in the Parquet specification.
Does Arrow / Parquet have any support for delta encoding?
Some data series compress better when their differences are stored rather than the values themselves.
Here's an example where the differences are mostly equal to 7 but occasionally more:
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq
N = 500000
delta_r = np.full(N,7)
for _ in range(10):
delta_r[np.random.randint(N,size=N//100)] += 1
r = np.cumsum(delta_r)
drcheck = np.diff(r,prepend=0)
assert (delta_r == drcheck).all()
a = pa.array(r)
adiff = pa.array(delta_r)
t = pa.Table.from_arrays([a],['r'])
tdiff = pa.Table.from_arrays([adiff],['delta_r'])
and when I look at the resulting files:
-rw-rw-rw- 1 user group 2591101 Nov 16 13:29 t.pq
-rw-rw-rw- 1 user group 81049 Nov 16 13:29 tdiff.pq