hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venki Korukanti (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-12680) Binary type partition column values are incorrectly serialized and deserialized
Date Tue, 15 Dec 2015 21:48:47 GMT
Venki Korukanti created HIVE-12680:
--------------------------------------

             Summary: Binary type partition column values are incorrectly serialized and deserialized
                 Key: HIVE-12680
                 URL: https://issues.apache.org/jira/browse/HIVE-12680
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.2.1
            Reporter: Venki Korukanti
            Priority: Minor


Here are the repro steps:

{code}
CREATE TABLE kv_binary(key INT, value STRING) PARTITIONED BY (binary_part BINARY);
INSERT INTO TABLE kv_binary PARTITION (binary_part='somevalue') SELECT * FROM kv LIMIT 1;
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-12-15 13:34:15,758 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1142919541_0001
Loading data to table default.kv_binary partition (binary_part=[B@15871)
Partition default.kv_binary{binary_part=[B@15871} stats: [numFiles=1, numRows=1, totalSize=13,
rawDataSize=12]
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 8192 HDFS Write: 11733 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
{code}

Partition created has java object reference as value in FileSystem:
{code}
hadoop fs -ls /user/hive/warehouse/kv_binary
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2015-12-15 13:34 /user/hive/warehouse/kv_binary/binary_part=%5BB@15871
{code}

Selecting from the same table:
{code}
hive> SELECT * FROM kv_binary;
OK
238	val/238=	[B@15871
{code}

This makes the binary partitions unusable, but binary partitions doesn't seem to be commonly
used. Logging the bug for tracking purposes. Seems like somewhere are calling the toString
on byte[].

BTW, this is working fine in Hive 1.0.0. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message