hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Robertson <charles.robert...@gmail.com>
Subject Re: New lines causing new rows
Date Thu, 21 Aug 2014 10:55:32 GMT
I have fixed this - I was using the wrong character to replace line breaks.
I was replacing \r when it needed to be \n.

Regards,
Charles


On 18 August 2014 08:44, Charles Robertson <charles.robertson@gmail.com>
wrote:

> Hi Andre,
>
> Table and view definitions:
>
> CREATE EXTERNAL TABLE tweets_raw (
>    id BIGINT,
>    created_at STRING,
>    text STRING,
>    screen_name STRING,
>    name STRING
> )
> ROW FORMAT SERDE 'com.amazon.elasticmapreduce.JsonSerde'
> WITH SERDEPROPERTIES (
>       'paths'='id, created_at, text, user.screen_name, user.name'
>       )
> LOCATION '/user/flume/tweets/test1/';
>
> CREATE VIEW tweets_simple AS
> SELECT
>   id,
>   cast ( from_unixtime( unix_timestamp(concat( '2014 ',
> substring(created_at,5,15)), 'yyyy MMM dd hh:mm:ss')) as timestamp) ts,
>   translate(text, '\r', ' ') as text,
>   name,
>   screen_name
> FROM tweets_raw;
>
> As you can see from the table definition, the raw data source is text
> files output by flume.
>
> Thanks,
> Charles
>
>
> On 18 August 2014 00:42, Andre Araujo <araujo@pythian.com> wrote:
>
>> Hi, Charles,
>>
>> What's the storage format for the raw data source?
>> What's the definition of your view?
>>
>>
>> On 18 August 2014 04:20, Charles Robertson <charles.robertson@gmail.com>
>> wrote:
>>
>>> HI all,
>>>
>>> I am loading some data into a Hive table, and one of the fields contains
>>> text which I believe contains new line characters. I have a view which
>>> reads data from this table, and the new line characters appear to be
>>> starting new rows
>>>
>>> Doing 'select * from [mytable] limit 10;' in the hive console returns
>>> ten rows, on more than ten lines. Doing 'select * from [view] limit 10' in
>>> the console return ten lines but fewer than ten rows.
>>>
>>> I've tried using the 'translate' function in the view definition to
>>> replace \r with a space character, but that seems to have just broken
>>> everything (it complains of a missing EOF).
>>>
>>> Can anyone suggest a better way to remove the line breaks and/or prevent
>>> the view treating them as new rows?
>>>
>>> Thanks,
>>> Charles
>>>
>>
>>
>>
>> --
>> André Araújo
>> Big Data Consultant/Solutions Architect
>> The Pythian Group - Australia - www.pythian.com
>>
>> Office (calls from within Australia): 1300 366 021 x1270
>> Office (international): +61 2 8016 7000  x270 *OR* +1 613 565 8696
>> x1270
>> Mobile: +61 410 323 559
>> Fax: +61 2 9805 0544
>> IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk
>>
>> “Success is not about standing at the top, it's the steps you leave
>> behind.” — Iker Pou (rock climber)
>>
>> --
>>
>>
>>
>>
>

Mime
View raw message