hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Robertson <charles.robert...@gmail.com>
Subject Re: New lines causing new rows
Date Mon, 18 Aug 2014 07:44:25 GMT
Hi Andre,

Table and view definitions:

CREATE EXTERNAL TABLE tweets_raw (
   id BIGINT,
   created_at STRING,
   text STRING,
   screen_name STRING,
   name STRING
)
ROW FORMAT SERDE 'com.amazon.elasticmapreduce.JsonSerde'
WITH SERDEPROPERTIES (
      'paths'='id, created_at, text, user.screen_name, user.name'
      )
LOCATION '/user/flume/tweets/test1/';

CREATE VIEW tweets_simple AS
SELECT
  id,
  cast ( from_unixtime( unix_timestamp(concat( '2014 ',
substring(created_at,5,15)), 'yyyy MMM dd hh:mm:ss')) as timestamp) ts,
  translate(text, '\r', ' ') as text,
  name,
  screen_name
FROM tweets_raw;

As you can see from the table definition, the raw data source is text files
output by flume.

Thanks,
Charles


On 18 August 2014 00:42, Andre Araujo <araujo@pythian.com> wrote:

> Hi, Charles,
>
> What's the storage format for the raw data source?
> What's the definition of your view?
>
>
> On 18 August 2014 04:20, Charles Robertson <charles.robertson@gmail.com>
> wrote:
>
>> HI all,
>>
>> I am loading some data into a Hive table, and one of the fields contains
>> text which I believe contains new line characters. I have a view which
>> reads data from this table, and the new line characters appear to be
>> starting new rows
>>
>> Doing 'select * from [mytable] limit 10;' in the hive console returns ten
>> rows, on more than ten lines. Doing 'select * from [view] limit 10' in the
>> console return ten lines but fewer than ten rows.
>>
>> I've tried using the 'translate' function in the view definition to
>> replace \r with a space character, but that seems to have just broken
>> everything (it complains of a missing EOF).
>>
>> Can anyone suggest a better way to remove the line breaks and/or prevent
>> the view treating them as new rows?
>>
>> Thanks,
>> Charles
>>
>
>
>
> --
> André Araújo
> Big Data Consultant/Solutions Architect
> The Pythian Group - Australia - www.pythian.com
>
> Office (calls from within Australia): 1300 366 021 x1270
> Office (international): +61 2 8016 7000  x270 *OR* +1 613 565 8696   x1270
> Mobile: +61 410 323 559
> Fax: +61 2 9805 0544
> IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk
>
> “Success is not about standing at the top, it's the steps you leave behind.”
> — Iker Pou (rock climber)
>
> --
>
>
>
>

Mime
View raw message