hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Hakobian <nicholas.hakob...@rallyhealth.com>
Subject Re: a newline in column data ruin Hive
Date Tue, 23 Feb 2016 23:55:18 GMT
We just had this problem recently with our data. There are actually 2
things you have to worry about. The reader (which the suggestion above
seems to solve) and the intermediate stages (if using MR). We didn't
have the issue with the reader since we use Parquet and Avro to store
our data, but we had issues with the intermediate. The default
intermediate output for many distros is TextFile so any serialization
will re-introduce the newline issue. If you set this option:

set hive.query.result.fileformat=SequenceFile;

It'll use sequence files for the intermediate output which uses
different delimiters for newline than \n. The wiki used to
specifically say this, but it looks like the wiki has been changed
since the default was changed to SequenceFile for 2.1

Nicholas Szandor Hakobian
Data Scientist
Rally Health
nicholas.hakobian@rallyhealth.com




On Tue, Feb 23, 2016 at 3:04 PM, Rajit Saha <rsaha@lendingclub.com> wrote:
> Hi Mahender,
>
> You can try ESCAPED BY '\\'
>
> Like a sample below
>
> CREATE EXTERNAL TABLE test
> (
> a1 int,
> b1 string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ESCAPED BY '\\'
> STORED AS TEXTFILE
> LOCATION ‘<HDFS Location>';
>
> Thanks
> Rajit
>
>
> From: mahender bigdata <Mahender.BigData@outlook.com>
> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
> Date: Tuesday, February 23, 2016 at 2:51 PM
> To: "user@hive.apache.org" <user@hive.apache.org>
> Subject: a newline in column data ruin Hive
>
> Hi,
>
> We are facing issue while loading/reading data from file which has line
> delimiter characters like \n  has part of column data. When we try to query
> the Hive table, data with \n gets split up into multiple rows. Is there a
> way to tell hive to skip escape character like \n ( row delimiter or field
> delimiter) within in column data.  We saw LINE Terminator property in create
> table syntax, but currently it accepts only \n. Is there a way to have
> custom line terminator property.
>
>
> Thanks in advance
>
>
> ________________________________
> DISCLAIMER: The information transmitted is intended only for the person or
> entity to which it is addressed and may contain confidential and/or
> privileged material. Any review, re-transmission, dissemination or other use
> of, or taking of any action in reliance upon this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and destroy any copies of this
> document and any attachments.

Mime
View raw message