Hi, 

I'm getting "CSV parser got out of sync with chunker", any idea on how to troubleshoot this? 
If I feed the original file it fails after 1477218 rows 
if I remove the first line after the header then it fails after 2919443 rows  
if I remove the first 2 lines after the header  then it fails after 55339 rows
if I remove the first 3 lines after the header then it fails after 8200437 rows
if I remove the first 4 line after the header then if fails after 1866573 rows
To me it doesn't make sense, the failure shows at different, seemly random places.

What can be causing this?  source code below-> 



Traceback (most recent call last):
  File "pa_inspect.py", line 15, in <module>
    for b in reader:
  File "pyarrow/ipc.pxi", line 497, in __iter__
  File "pyarrow/ipc.pxi", line 531, in pyarrow.lib.RecordBatchReader.read_next_batch
  File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: CSV parser got out of sync with chunker
in 


import pyarrow as pa
from pyarrow import csv
import pyarrow.parquet as pq

# http://arrow.apache.org/docs/python/generated/pyarrow.csv.open_csv.html#pyarrow.csv.open_csv
# http://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVStreamingReader.html
reader = csv.open_csv('inspect.csv')


# ParquetWriter : https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html
# RecordBat
# http://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing
crow = 0
with pq.ParquetWriter('inspect.parquet', reader.schema) as writer:
    for b in reader:
        print(b.num_rows,b.num_columns)
        crow = crow + b.num_rows
        print(crow)
        writer.write_table(pa.Table.from_batches([b]))

--
/Rubén