hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xihuyu2000"<xihuyu2...@126.com>
Subject Re: Re: binary column data consistency in hive table copy
Date Mon, 14 Sep 2015 23:16:05 GMT
if use CTAS then a MR job occures.  Maybe the problem is in the MR job.
2015-09-15 

xihuyu2000 



发件人:Jason Dere <jdere@hortonworks.com>
发送时间:2015-09-15 06:00
主题:Re: binary column data consistency in hive table copy
收件人:"user@hive.apache.org"<user@hive.apache.org>
抄送:

Looks like your table is using text storage format. Binary data needs to be stored as base64
in TextInputformat, so those values are probably being interpreted as base64 strings.






From: Ujjwal Wadhawan <uwadhawan@gmail.com>
Sent: Monday, September 14, 2015 2:32 PM
To: user@hive.apache.org
Subject: binary column data consistency in hive table copy 

Hi all,


I recently observed a behavior in hive that I’ll like to share and get inputs.

Scenario:

Say you have a hive table with a binary column.

create table binsource (bincol binary);

and some input data

$ cat /nis3/home/ujjwal2/test2/binin
10000101
121
10
1011
Asfs


Let’s load the data in the table

LOAD DATA LOCAL INPATH '/home/ujjwal2/test2/binin' OVERWRITE INTO TABLE binsource;

When I do a select * on hive CLI, I see following characters (see image)





The underlying HDFS file still has the actual input though.



Now I make a copy of this table using command "create table ujjwal2.bintarget as select *
from ujjwal2.binsource;".





ISSUE:


Now when I see the underlying file create on HDFS for bintarget, I see some extra characters.



In may combinations I have tried, the extra characters are in “=”, “w” and “A”.



10000101
120=
1w==
1011
Asfs


Does anyone know what these characters signify ?

Best,
Ujjwal
Mime
View raw message