hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Shao <zsh...@gmail.com>
Subject Re: Lines terminated by
Date Tue, 24 Feb 2009 19:46:00 GMT
I don't think there is one for TextInputFormat using custom line separator
right now.

BTW, the current logic in TextInputFormat for link breaks is pretty
complicated because it needs to deal with "\n", "\r\n", "\n\r" etc.
Of course that does not preclude us from adding a custom line separator but
it may make the solution a little more nonobvious.

Zheng

On Tue, Feb 24, 2009 at 8:24 AM, Johan Oskarsson <johan@oskarsson.nu> wrote:

> Ok, I'll create a Hive jira for it then. I can't find a Hadoop one for
> adding custom line separators in the TextInputFormat either, if there is
> none I'll create that too.
>
> /Johan
>
> Zheng Shao wrote:
> > Yes this is a known issue. "lines terminated by" is not supported yet
> > because the text input format do not allow configurable line
> > separators yet.
> >
> >
> > Zheng
> >
> >
> > On 2/24/09, Johan Oskarsson <johan@oskarsson.nu> wrote:
> >> I've been trying to use a text file with the field separator \001 and
> >> line separator \002\n in Hive, similar to what's described here
> >> http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL.
> >>
> >> I've set the "lines terminated by" to \002 but when selecting data the
> >> last column still includes that character, if the last col is a string.
> >> If it's an int the value fails to parse and is left as null.
> >>
> >> Is this a known issue? I can't find a ticket for it.
> >> I assume the TextInputFormat takes care of the \n so that I only need to
> >> use \002 as the termination.
> >>
> >> Example create queries
> >>
> >> create table artistsong (export_date bigint, artist_id int, song_id int)
> >> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' lines terminated by
> '\002'
> >>
> >> and this one:
> >>
> >> create table artist (export_date bigint, artist_id int, name string,
> >> is_actual_artist int, view_url string) ROW FORMAT DELIMITED FIELDS
> >> TERMINATED BY '\001' lines terminated by '\002'
> >>
> >> /Johan
> >>
> >
>
>


-- 
Yours,
Zheng

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message