hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhiwen Sun <pens...@gmail.com>
Subject Re: how to handle variable format data of text file?
Date Mon, 18 Mar 2013 08:23:32 GMT
As u defined in create table hql: fields delimited by blank space. So, the
other data is omitted

if you wanna contain rest data at the end of line. I suggest you use
org.apache.hadoop.hive.contrib.serde2.RegexSerDe row format instead of
default delimited format.


Zhiwen Sun



On Mon, Mar 11, 2013 at 12:04 PM, 周梦想 <ablozhou@gmail.com> wrote:

> I have files like this:
> 03/11/13 10:59:52 00000ec0 1009 180538126 92041 2300 0 0 7 21|47|20|33|11
> 0:2775
> 03/11/13 10:59:52 00000744 1010 178343610 92042 350 1 0 -1 NULL NULL 22 45
> the format is separated by blank space:
> date time threadid gid userid [variable formated data grouped by fields
> separated by space ]
>
> I'd like to create a table like:
>
> hive> create external table handresult (hdate string,htime string, thid
> string, gid int, userid string,ldata string) row format delimited fields
> terminated by  " ";
> OK
>
> but the above table will only have a part of the data.
> select * from handresult;
> 03/11/13 10:59:52 00000ec0 1009 180538126 92041
> 03/11/13 10:59:52 00000744 1010 178343610 92042
>
> the remain data  like "2300 0 0 7 21|47|20|33|11 0:2775 "  I can't get.
>
> while ldata may be variance length and format separated by " " or an
> array, the ldata we will parse diferent  by each gid.
>
> how do this?
>
> Thanks,
> Andy Zhou
>

Mime
View raw message