spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Ivanfi (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (SPARK-20297) Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala
Date Thu, 29 Mar 2018 16:47:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-20297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zoltan Ivanfi updated SPARK-20297:
----------------------------------
    Comment: was deleted

(was: Could you please clarify how those DECIMALS were written in the first place?
 * If some manual configuration was done to allow Spark to choose this representation, then
we are fine.
 * If an upstream Spark version wrote data using this representation by default, that's a
valid reason to feel mildly uncomfortable.
 * If a downstream Spark version wrote data using this representation by default, then we
should open a JIRA to prevent CDH Spark from doing so until Hive and Impala supports it.

Thanks!)

> Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala
> -----------------------------------------------------------------------
>
>                 Key: SPARK-20297
>                 URL: https://issues.apache.org/jira/browse/SPARK-20297
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Mostafa Mokhtar
>            Priority: Major
>              Labels: integration
>
> While trying to load some data using Spark 2.1 I realized that decimal(12,2) columns
stored in Parquet written by Spark are not readable by Hive or Impala.
> Repro 
> {code}
> CREATE TABLE customer_acctbal(
>   c_acctbal decimal(12,2))
> STORED AS Parquet;
> insert into customer_acctbal values (7539.95);
> {code}
> Error from Hive
> {code}
> Failed with exception java.io.IOException:parquet.io.ParquetDecodingException: Can not
read value at 1 in block 0 in file hdfs://server1:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-00000-03d6e3bb-fe5e-4f20-87a4-88dec955dfcd.snappy.parquet
> Time taken: 0.122 seconds
> {code}
> Error from Impala
> {code}
> File 'hdfs://server:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-00000-32db4c61-fe67-4be2-9c16-b55c75c517a4.snappy.parquet'
has an incompatible Parquet schema for column 'tpch_nested_3000_parquet.customer_acctbal.c_acctbal'.
Column type: DECIMAL(12,2), Parquet schema:
> optional int64 c_acctbal [i:0 d:1 r:0] (1 of 2 similar)
> {code}
> Table info 
> {code}
> hive> describe formatted customer_acctbal;
> OK
> # col_name              data_type               comment
> c_acctbal               decimal(12,2)
> # Detailed Table Information
> Database:               tpch_nested_3000_parquet
> Owner:                  mmokhtar
> CreateTime:             Mon Apr 10 17:47:24 PDT 2017
> LastAccessTime:         UNKNOWN
> Protect Mode:           None
> Retention:              0
> Location:               hdfs://server1.com:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal
> Table Type:             MANAGED_TABLE
> Table Parameters:
>         COLUMN_STATS_ACCURATE   true
>         numFiles                1
>         numRows                 0
>         rawDataSize             0
>         totalSize               120
>         transient_lastDdlTime   1491871644
> # Storage Information
> SerDe Library:          org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat:            org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat:           org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Compressed:             No
> Num Buckets:            -1
> Bucket Columns:         []
> Sort Columns:           []
> Storage Desc Params:
>         serialization.format    1
> Time taken: 0.032 seconds, Fetched: 31 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message