arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Reback (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ARROW-1285) NotImplemented exception creates empty parquet file
Date Thu, 27 Jul 2017 10:08:00 GMT
Jeff Reback created ARROW-1285:
----------------------------------

             Summary: NotImplemented exception creates empty parquet file
                 Key: ARROW-1285
                 URL: https://issues.apache.org/jira/browse/ARROW-1285
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.5.0
            Reporter: Jeff Reback
            Priority: Minor


This is correctly raising (because categorical is not implemented), but it is creating an
empty file.

xref https://github.com/pandas-dev/pandas/pull/15838#pullrequestreview-52576290

{code}
In [2]:    df = pd.DataFrame({'a': list('abc'),
   ...:                       'b': list(range(1, 4)),
   ...:                       'c': np.arange(3, 6).astype('u1'),
   ...:                       'd': np.arange(4.0, 7.0, dtype='float64'),
   ...:                       'e': [True, False, True],
   ...:                       'f': pd.Categorical(list('abc')),
   ...:                       'g': pd.date_range('20130101', periods=3),
   ...:                       'h': pd.date_range('20130101', periods=3, tz='US/Eastern'),
   ...:                       'i': pd.date_range('20130101', periods=3, freq='ns')})
   ...: 

In [3]: df.to_parquet('foo.pq')
---------------------------------------------------------------------------
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
<ipython-input-3-8070fb7e3e2c> in <module>()
----> 1 df.to_parquet('foo.pq')

/Users/jreback/pandas/pandas/core/frame.py in to_parquet(self, fname, engine, compression,
**kwargs)
   1620         from pandas.io.parquet import to_parquet
   1621         to_parquet(self, fname, engine,
-> 1622                    compression=compression, **kwargs)
   1623 
   1624     @Substitution(header='Write out column names. If a list of string is given, \

/Users/jreback/pandas/pandas/io/parquet.py in to_parquet(df, path, engine, compression, **kwargs)
    152         raise ValueError("parquet must have string column names")
    153 
--> 154     return impl.write(df, path, compression=compression)
    155 
    156 

/Users/jreback/pandas/pandas/io/parquet.py in write(self, df, path, compression, **kwargs)
     53         table = self.api.Table.from_pandas(df, timestamps_to_ms=True)
     54         self.api.parquet.write_table(
---> 55             table, path, compression=compression, **kwargs)
     56 
     57     def read(self, path):

/Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/parquet.py in write_table(table,
where, row_group_size, version, use_dictionary, compression, use_deprecated_int96_timestamps,
**kwargs)
    770         version=version,
    771         use_deprecated_int96_timestamps=use_deprecated_int96_timestamps)
--> 772     writer = ParquetWriter(where, table.schema, **options)
    773     writer.write_table(table, row_group_size=row_group_size)
    774     writer.close()

_parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()

error.pxi in pyarrow.lib.check_status()

ArrowNotImplementedError: NotImplemented: unhandled type

In [4]: !ls -ltr foo.pq
-rw-r--r--  1 jreback  staff  0 Jul 27 06:03 foo.pq
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message