arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Challis (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided
Date Fri, 06 Apr 2018 11:41:00 GMT
Dave Challis created ARROW-2406:
-----------------------------------

             Summary: [Python] Segfault when creating PyArrow table from Pandas for empty
string column when schema provided
                 Key: ARROW-2406
                 URL: https://issues.apache.org/jira/browse/ARROW-2406
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.9.0
         Environment: Mac OS High Sierra
Python 3.6.3
            Reporter: Dave Challis


Minimal example to recreate:

 

 
{code:python}
import pandas as pd
import pyarrow as pa

df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema){code}
 

This causes the python interpreter to exit with "Segmentation fault: 11".

The following examples all work without any issue:
{code:python}
# column 'a' is no longer empty
df = pd.DataFrame({'a': ['foo']})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)
{code}

{code:python}
# column 'a' is empty, but no schema is specified
df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
pa.Table.from_pandas(df)
{code}


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message