hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venki Korukanti (JIRA)" <>
Subject [jira] [Created] (HIVE-12680) Binary type partition column values are incorrectly serialized and deserialized
Date Tue, 15 Dec 2015 21:48:47 GMT
Venki Korukanti created HIVE-12680:

             Summary: Binary type partition column values are incorrectly serialized and deserialized
                 Key: HIVE-12680
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.2.1
            Reporter: Venki Korukanti
            Priority: Minor

Here are the repro steps:

CREATE TABLE kv_binary(key INT, value STRING) PARTITIONED BY (binary_part BINARY);
INSERT INTO TABLE kv_binary PARTITION (binary_part='somevalue') SELECT * FROM kv LIMIT 1;
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-12-15 13:34:15,758 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1142919541_0001
Loading data to table default.kv_binary partition (binary_part=[B@15871)
Partition default.kv_binary{binary_part=[B@15871} stats: [numFiles=1, numRows=1, totalSize=13,
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 8192 HDFS Write: 11733 SUCCESS
Total MapReduce CPU Time Spent: 0 msec

Partition created has java object reference as value in FileSystem:
hadoop fs -ls /user/hive/warehouse/kv_binary
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2015-12-15 13:34 /user/hive/warehouse/kv_binary/binary_part=%5BB@15871

Selecting from the same table:
hive> SELECT * FROM kv_binary;
238	val/238=	[B@15871

This makes the binary partitions unusable, but binary partitions doesn't seem to be commonly
used. Logging the bug for tracking purposes. Seems like somewhere are calling the toString
on byte[].

BTW, this is working fine in Hive 1.0.0. 

This message was sent by Atlassian JIRA

View raw message