Delta encoding hasn't been implemented in the C++ code that pyarrow binds to.  It is supported in the Parquet specification.

On Mon, Nov 16, 2020 at 12:30 PM Jason Sachs <> wrote:
Does Arrow / Parquet have any support for delta encoding?

Some data series compress better when their differences are stored rather than the values themselves.

Here's an example where the differences are mostly equal to 7 but occasionally more:

import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq

N = 500000
delta_r = np.full(N,7)
for _ in range(10):
    delta_r[np.random.randint(N,size=N//100)] += 1
r = np.cumsum(delta_r)
drcheck = np.diff(r,prepend=0)
assert (delta_r == drcheck).all()

a = pa.array(r)
adiff = pa.array(delta_r)
t = pa.Table.from_arrays([a],['r'])
tdiff = pa.Table.from_arrays([adiff],['delta_r'])


and when I look at the resulting files:

-rw-rw-rw-   1 user     group     2591101 Nov 16 13:29 t.pq
-rw-rw-rw-   1 user     group       81049 Nov 16 13:29 tdiff.pq