hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Subramanian <>
Subject Re: Special characters in web log file causing issues
Date Mon, 08 Jul 2013 23:46:02 GMT
U may have to remove non-printable chars first, save an intermediate file and then load into

tr -cd '[:print:]\r\n\t'

Or if u have strings function that will only output printable chars

From: Raj Hadoop <<>>
Reply-To: "<>" <<>>,
Raj Hadoop <<>>
Date: Monday, July 8, 2013 1:52 PM
To: Hive <<>>
Subject: Special characters in web log file causing issues

Hi ,

The log file that I am trying to load throuh Hive has some special characters

The field is shown below and the special characters ¿¿are also shown.

    Shockwave Flash;Chrome Remote Desktop Viewer;Native Client;Chrome PDF Viewer;Adobe Acrobat;Microsoft
Office 2010;Motive Plug-
    in;Motive Management Plug-in;Google Update;Java(TM) Platform SE 7 U21;McAfee SiteAdvisor;McAfee
Virtual Technician;Windows     Live¿¿ Photo Gallery;McAfee SecurityCenter;Silverlig

The above is causing the record to be terminated and loading another line.  How can I avoid
this type of issues and how to load the proper data ? Any suggestions please.


This email message and any attachments are for the exclusive use of the intended recipient(s)
and may contain confidential and privileged information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient, please contact the sender
by reply email and destroy all copies of the original message along with any attachments,
from your computer system. If you are the intended recipient, please be advised that the content
of this message is subject to access, review and disclosure by the sender's Email System Administrator.

View raw message