arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neal Richardson (Jira)" <j...@apache.org>
Subject [jira] [Created] (ARROW-7063) [C++] Schema print method prints too much metadata
Date Mon, 04 Nov 2019 23:18:00 GMT
Neal Richardson created ARROW-7063:
--------------------------------------

       Summary: [C++] Schema print method prints too much metadata
         Key: ARROW-7063
         URL: https://issues.apache.org/jira/browse/ARROW-7063
       Project: Apache Arrow
     Issue Type: Improvement
     Components: C++, C++ - Dataset
      Reporter: Neal Richardson
       Fix For: 1.0.0


I loaded some taxi data in a Dataset and printed the schema. This is what was printed:

{code}
vendor_id: string
pickup_at: timestamp[us]
dropoff_at: timestamp[us]
passenger_count: int8
trip_distance: float
pickup_longitude: float
pickup_latitude: float
rate_code_id: null
store_and_fwd_flag: string
dropoff_longitude: float
dropoff_latitude: float
payment_type: string
fare_amount: float
extra: float
mta_tax: float
tip_amount: float
tolls_amount: float
total_amount: float
-- metadata --
pandas: {"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 14387371, "step":
1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type":
"object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "vendor_id", "field_name":
"vendor_id", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name":
"pickup_at", "field_name": "pickup_at", "pandas_type": "datetime", "numpy_type": "datetime64[ns]",
"metadata": null}, {"name": "dropoff_at", "field_name": "dropoff_at", "pandas_type": "datetime",
"numpy_type": "datetime64[ns]", "metadata": null}, {"name": "passenger_count", "field_name":
"passenger_count", "pandas_type": "int8", "numpy_type": "int8", "metadata": null}, {"name":
"trip_distance", "field_name": "trip_distance", "pandas_type": "float32", "numpy_type": "float32",
"metadata": null}, {"name": "pickup_longitude", "field_name": "pickup_longitude", "pandas_type":
"float32", "numpy_type": "float32", "metadata": null}, {"name": "pickup_latitude", "field_name":
"pickup_latitude", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name":
"rate_code_id", "field_name": "rate_code_id", "pandas_type": "empty", "numpy_type": "object",
"metadata": null}, {"name": "store_and_fwd_flag", "field_name": "store_and_fwd_flag", "pandas_type":
"unicode", "numpy_type": "object", "metadata": null}, {"name": "dropoff_longitude", "field_name":
"dropoff_longitude", "pandas_type": "float32", "numpy_type": "float32", "metadata": null},
{"name": "dropoff_latitude", "field_name": "dropoff_latitude", "pandas_type": "float32", "numpy_type":
"float32", "metadata": null}, {"name": "payment_type", "field_name": "payment_type", "pandas_type":
"unicode", "numpy_type": "object", "metadata": null}, {"name": "fare_amount", "field_name":
"fare_amount", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name":
"extra", "field_name": "extra", "pandas_type": "float32", "numpy_type": "float32", "metadata":
null}, {"name": "mta_tax", "field_name": "mta_tax", "pandas_type": "float32", "numpy_type":
"float32", "metadata": null}, {"name": "tip_amount", "field_name": "tip_amount", "pandas_type":
"float32", "numpy_type": "float32", "metadata": null}, {"name": "tolls_amount", "field_name":
"tolls_amount", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name":
"total_amount", "field_name": "total_amount", "pandas_type": "float32", "numpy_type": "float32",
"metadata": null}], "creator": {"library": "pyarrow", "version": "0.15.1"}, "pandas_version":
"0.25.3"}
ARROW:schema: /////3gOAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABAwAQAAAAAAAKAAwAAAAEAAgACgAAAFQKAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAAAsCgAABAAAAB8KAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDE0Mzg3MzcxLCAic3RlcCI6IDF9XSwgImNvbHVtbl9pbmRleGVzIjogW3sibmFtZSI6IG51bGwsICJmaWVsZF9uYW1lIjogbnVsbCwgInBhbmRhc190eXBlIjogInVuaWNvZGUiLCAibnVtcHlfdHlwZSI6ICJvYmplY3QiLCAibWV0YWRhdGEiOiB7ImVuY29kaW5nIjogIlVURi04In19XSwgImNvbHVtbnMiOiBbeyJuYW1lIjogInZlbmRvcl9pZCIsICJmaWVsZF9uYW1lIjogInZlbmRvcl9pZCIsICJwYW5kYXNfdHlwZSI6ICJ1bmljb2RlIiwgIm51bXB5X3R5cGUiOiAib2JqZWN0IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJwaWNrdXBfYXQiLCAiZmllbGRfbmFtZSI6ICJwaWNrdXBfYXQiLCAicGFuZGFzX3R5cGUiOiAiZGF0ZXRpbWUiLCAibnVtcHlfdHlwZSI6ICJkYXRldGltZTY0W25zXSIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAiZHJvcG9mZl9hdCIsICJmaWVsZF9uYW1lIjogImRyb3BvZmZfYXQiLCAicGFuZGFzX3R5cGUiOiAiZGF0ZXRpbWUiLCAibnVtcHlfdHlwZSI6ICJkYXRldGltZTY0W25zXSIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAicGFzc2VuZ2VyX2NvdW50IiwgImZpZWxkX25hbWUiOiAicGFzc2VuZ2VyX2NvdW50IiwgInBhbmRhc190eXBlIjogImludDgiLCAibnVtcHlfdHlwZSI6ICJpbnQ4IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJ0cmlwX2Rpc3RhbmNlIiwgImZpZWxkX25hbWUiOiAidHJpcF9kaXN0YW5jZSIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAicGlja3VwX2xvbmdpdHVkZSIsICJmaWVsZF9uYW1lIjogInBpY2t1cF9sb25naXR1ZGUiLCAicGFuZGFzX3R5cGUiOiAiZmxvYXQzMiIsICJudW1weV90eXBlIjogImZsb2F0MzIiLCAibWV0YWRhdGEiOiBudWxsfSwgeyJuYW1lIjogInBpY2t1cF9sYXRpdHVkZSIsICJmaWVsZF9uYW1lIjogInBpY2t1cF9sYXRpdHVkZSIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAicmF0ZV9jb2RlX2lkIiwgImZpZWxkX25hbWUiOiAicmF0ZV9jb2RlX2lkIiwgInBhbmRhc190eXBlIjogImVtcHR5IiwgIm51bXB5X3R5cGUiOiAib2JqZWN0IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJzdG9yZV9hbmRfZndkX2ZsYWciLCAiZmllbGRfbmFtZSI6ICJzdG9yZV9hbmRfZndkX2ZsYWciLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAiZHJvcG9mZl9sb25naXR1ZGUiLCAiZmllbGRfbmFtZSI6ICJkcm9wb2ZmX2xvbmdpdHVkZSIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAiZHJvcG9mZl9sYXRpdHVkZSIsICJmaWVsZF9uYW1lIjogImRyb3BvZmZfbGF0aXR1ZGUiLCAicGFuZGFzX3R5cGUiOiAiZmxvYXQzMiIsICJudW1weV90eXBlIjogImZsb2F0MzIiLCAibWV0YWRhdGEiOiBudWxsfSwgeyJuYW1lIjogInBheW1lbnRfdHlwZSIsICJmaWVsZF9uYW1lIjogInBheW1lbnRfdHlwZSIsICJwYW5kYXNfdHlwZSI6ICJ1bmljb2RlIiwgIm51bXB5X3R5cGUiOiAib2JqZWN0IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJmYXJlX2Ftb3VudCIsICJmaWVsZF9uYW1lIjogImZhcmVfYW1vdW50IiwgInBhbmRhc190eXBlIjogImZsb2F0MzIiLCAibnVtcHlfdHlwZSI6ICJmbG9hdDMyIiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJleHRyYSIsICJmaWVsZF9uYW1lIjogImV4dHJhIiwgInBhbmRhc190eXBlIjogImZsb2F0MzIiLCAibnVtcHlfdHlwZSI6ICJmbG9hdDMyIiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJtdGFfdGF4IiwgImZpZWxkX25hbWUiOiAibXRhX3RheCIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAidGlwX2Ftb3VudCIsICJmaWVsZF9uYW1lIjogInRpcF9hbW91bnQiLCAicGFuZGFzX3R5cGUiOiAiZmxvYXQzMiIsICJudW1weV90eXBlIjogImZsb2F0MzIiLCAibWV0YWRhdGEiOiBudWxsfSwgeyJuYW1lIjogInRvbGxzX2Ftb3VudCIsICJmaWVsZF9uYW1lIjogInRvbGxzX2Ftb3VudCIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAidG90YWxfYW1vdW50IiwgImZpZWxkX25hbWUiOiAidG90YWxfYW1vdW50IiwgInBhbmRhc190eXBlIjogImZsb2F0MzIiLCAibnVtcHlfdHlwZSI6ICJmbG9hdDMyIiwgIm1ldGFkYXRhIjogbnVsbH1dLCAiY3JlYXRvciI6IHsibGlicmFyeSI6ICJweWFycm93IiwgInZlcnNpb24iOiAiMC4xNS4xIn0sICJwYW5kYXNfdmVyc2lvbiI6ICIwLjI1LjMifQAGAAAAcGFuZGFzAAASAAAAxAMAAHgDAABEAwAAAAMAAMgCAACMAgAAVAIAACACAADoAQAArAEAAHABAAA8AQAACAEAANgAAACoAAAAdAAAADwAAAAEAAAAlPz//wAAAQMYAAAADAAAAAQAAAAAAAAAyvz//wAAAQAMAAAAdG90YWxfYW1vdW50AAAAAMj8//8AAAEDGAAAAAwAAAAEAAAAAAAAAP78//8AAAEADAAAAHRvbGxzX2Ftb3VudAAAAAD8/P//AAABAxgAAAAMAAAABAAAAAAAAAAy/f//AAABAAoAAAB0aXBfYW1vdW50AAAs/f//AAABAxgAAAAMAAAABAAAAAAAAABi/f//AAABAAcAAABtdGFfdGF4AFj9//8AAAEDGAAAAAwAAAAEAAAAAAAAAI79//8AAAEABQAAAGV4dHJhAAAAhP3//wAAAQMYAAAADAAAAAQAAAAAAAAAuv3//wAAAQALAAAAZmFyZV9hbW91bnQAtP3//wAAAQUUAAAADAAAAAQAAAAAAAAApP3//wwAAABwYXltZW50X3R5cGUAAAAA5P3//wAAAQMYAAAADAAAAAQAAAAAAAAAGv7//wAAAQAQAAAAZHJvcG9mZl9sYXRpdHVkZQAAAAAc/v//AAABAxgAAAAMAAAABAAAAAAAAABS/v//AAABABEAAABkcm9wb2ZmX2xvbmdpdHVkZQAAAFT+//8AAAEFFAAAAAwAAAAEAAAAAAAAAET+//8SAAAAc3RvcmVfYW5kX2Z3ZF9mbGFnAACI/v//AAABARQAAAAMAAAABAAAAAAAAAB4/v//DAAAAHJhdGVfY29kZV9pZAAAAAC4/v//AAABAxgAAAAMAAAABAAAAAAAAADu/v//AAABAA8AAABwaWNrdXBfbGF0aXR1ZGUA7P7//wAAAQMYAAAADAAAAAQAAAAAAAAAIv///wAAAQAQAAAAcGlja3VwX2xvbmdpdHVkZQAAAAAk////AAABAxgAAAAMAAAABAAAAAAAAABa////AAABAA0AAAB0cmlwX2Rpc3RhbmNlAAAAWP///wAAAQIkAAAAFAAAAAQAAAAAAAAACAAMAAgABwAIAAAAAAAAAQgAAAAPAAAAcGFzc2VuZ2VyX2NvdW50AJj///8AAAEKGAAAAAwAAAAEAAAAAAAAAM7///8AAAMACgAAAGRyb3BvZmZfYXQAAMj///8AAAEKIAAAABQAAAAEAAAAAAAAAAAABgAIAAYABgAAAAAAAwAJAAAAcGlja3VwX2F0AAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEFGAAAABAAAAAEAAAAAAAAAAQABAAEAAAACQAAAHZlbmRvcl9pZAAAAA==
{code}

I'd argue that extra metadata, if it's not part of the Arrow format and can be whatever an
application wants to put in there, should not be printed as part of the schema's ToString
method. It should be viewable some way, just not always. And IDK what to do with this {{ARROW:schema:
}} business but it's clearly not readable as is.--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message