arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rich Bramante <rbrama...@hotmail.com>
Subject Possible Decimal write issue with pyarrow
Date Mon, 01 Jun 2020 21:39:45 GMT
Python 3.7.6 (default, Jan 30 2020, 10:29:04)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
print(pyarrow.__version__)
0.17.1

Seeing an issue where DECIMAL values written can seem to be corrupted based on very subtle
changes to the data set. Example:

#!/bin/python3

import pandas as pd
import decimal
import pyarrow.parquet as pq

#$ python3
# Python 3.7.6 (default, Jan 30 2020, 10:29:04)
# [GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
# >>> print(pyarrow.__version__)
#  0.17.1

# Results in unexpected output
df = pd.DataFrame({"values": [decimal.Decimal('9223372036854775808'), decimal.Decimal('18446744073709551616'),
decimal.Decimal('2147483648'), decimal.Decimal('1.111'), decimal.Decimal('-2'), decimal.Decimal('0')]})

df.to_parquet("/tmp/f")
pq_file = pq.ParquetFile("/tmp/f")
print (pq_file.read().to_pandas())

#Values Read:
# -221360928884514619.392, -442721857769029238.784,2147483648.000,1.111,-2.000,0.000

# Results in expected output (only difference is 1.11 vs. 1.111)
df = pd.DataFrame({"values": [decimal.Decimal('9223372036854775808'), decimal.Decimal('18446744073709551616'),
decimal.Decimal('2147483648'), decimal.Decimal('1.11'), decimal.Decimal('-2'), decimal.Decimal('0')]})

#Values Read:
9223372036854775808.00,18446744073709551616.00,2147483648.00,1.11,-2.00,0.00

df.to_parquet("/tmp/f")
pq_file = pq.ParquetFile("/tmp/f")
print (pq_file.read().to_pandas())


Mime
View raw message