hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhandapani, Karthik" <Karthik.Dhandap...@CVSCaremark.com>
Subject Issue with a new line character in the data
Date Fri, 27 Mar 2015 17:00:45 GMT
Hi,

I have an scenario where new line character exists in data. Because of new line character,
number of records in Target is more than in source. Every record that has new line character
in the data is broken and it appears as 2 records in hive. When I use cat and pipe it to wc
-l, I am getting right counts, but when I use hadoop streaming to get the counts from HDFS
files, I am getting more records because of the issue with new line character. Also in Hive
External table, when I query the counts of records, it is more and the record is split has
2 records from the new line position. Is there an workaround in Sqoop/Hive to handle this
scenario, so hive can ignore new line character if it is part of the data.

We are in HDP 2.1 with sqoop 1.4.4 and hive 0.13 version.

Appreciate your help with this.

Thanks,
Karthik


Mime
View raw message