From "JP gupta"
Subject RE: Table count is more than File count after loading in hive
Date Tue, 30 May 2017 04:58:30 GMT
Assuming that special characters have been added by Windows platform as mentioned by  Shakti
Singh, one easy way to cleanup the file is using the command “dos2unix filename”. 


From: shakti singh Shekhawat 
Sent: 30 May 2017 10:02
Subject: Re: Table count is more than File count after loading in hive


Hi Balajee,


The best way will be to clean the file in Unix(or perl or python) before loading the file
in HIVE. The root cause should be most probably carriage return(occurs as mostly the files
generated on Microsoft platform consists of ^M characters in file). To identify whether carriage
return is the problem, try the below few steps:

1. `file` command will give you all Line terminators(\n,etc) in your file but it will be in
ASCII value.

Ex: file yourfilename

yourfilename: UTF-8 Unicode text, with CRLF, CR, LF line terminators

2. To find what CR(\r), LF(\n) and CRLF(\r\n) mean, try:

man ascii

Till here you will know whether there are carriage returns(\r) in your file which breaks the
record in HIVE.

3. To identify at which place the carriage return is there, open the file in vi-editor

Press Esc

Type   :set list

This should display all the ^M characters highlighted. Find the record where you can see ^M
in between the record. Go to Hive table do a select on this record, you will see that the
HIVE record is broken exactly where the ^M is seen in the file.


Please let us know if this works in identifying the issue, if carriage return is the problem,
next step is to remove carriage return from your file(you can easily find commands in stack
overflow, let me know if nothing works).




