arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: [Python] Trying to store nested dict in Table gives unexpected behavior
Date Sun, 31 Jan 2021 22:43:19 GMT
hi Partha,

I believe you have mixed up struct and map types. When you pass a
list-of-pydicts to Arrow, it infers a struct type for the dicts by
default, which means that all of the observed keys will be represented
in every entry (with null values if they are not present), so here
it's something like list<struct<CORE: struct< '0': string, '1':
string, '2': string>>>.

If you want a map type (where each dict has different entries), you
have to write down the map type you want explicitly and pass that when
constructing the Arrow array object. What you want is
list<struct<CORE: map<string, string>>> (I think)

- Wes


On Fri, Jan 29, 2021 at 9:23 AM PARTHA DUTTA <partha.dutta@gmail.com> wrote:
>
> I may be doing something wrong here, so any help would be greatly appreciated. I am trying
to store a nested python dict into an Arrow table, and I am getting some unexpected results.
This is sample code:
>
> import copy
> import pyarrow as pa
> import random
>
> def test_it():
>     arr = []
>     for f in range(5):
>         num_maps = random.randrange(4) + 1
>         print("Number of maps = {}".format(num_maps))
>         mdict = {}
>         mdict["CORE"] = {}
>         for r in range(num_maps):
>             mdict["CORE"][str(r)] = {"status": "realized"}
>         arr.append(copy.deepcopy(mdict))
>     tbl = pa.Table.from_pydict({"_map": arr})
>     print(tbl.to_pydict())
>
> test_it()
>
>
> This is the output of the code:
>
> Number of maps = 1
> Number of maps = 1
> Number of maps = 2
> Number of maps = 3
> Number of maps = 2
> {'_map': [{'CORE': {'0': {'status': 'realized'}, '1': None, '2': None}}, {'CORE': {'0':
{'status': 'realized'}, '1': None, '2': None}}, {'CORE': {'0': {'status': 'realized'}, '1':
{'status': 'realized'}, '2': None}}, {'CORE': {'0': {'status': 'realized'}, '1': {'status':
'realized'}, '2': {'status': 'realized'}}}, {'CORE': {'0': {'status': 'realized'}, '1': {'status':
'realized'}, '2': None}}]}
>
> It seems that when the table is created, it is filling in empty dict values such that
the number of elements is completely equal. This is not what I wanted, and I am wondering
if this is a feature, or am I missing something such that my intended output would not contain
"null" vales.
>
> Thanks,
> Partha
> --
> Partha Dutta
> partha.dutta@gmail.com

Mime
View raw message