hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Jiang <it.mjji...@gmail.com>
Subject Re: how to convert single line into multiple lines in a serde (txt in txt out)?
Date Wed, 30 Mar 2011 20:41:23 GMT
Thanks Edward. That'll work.

But that also means 2 tables will be created. How about we only want one
table by using some serde s.t. it reads apache web log, generates multiple
rows for each line of entry in the log that get loaded into the target table
that I want? Is it doable by customizing RegexSerde? i.e. "create external
table A (...) row format serde 'serdeclass' with serdeproperties (...)
stored as textfile location 'pathtoapachelog';" will give you the table that
has right fields extracted and multiple rows generated for query later.

If I cannot create such a serde without creating a 2nd table for the task, I
think streaming is a better choice from source code management aspect: using
serde requires you to manage more libraries (hadoop, hive ...) for build.

Thanks!

On Wed, Mar 30, 2011 at 1:16 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> On Wed, Mar 30, 2011 at 3:46 PM, Michael Jiang <it.mjjiang@gmail.com>
> wrote:
> > Also what if I want just one step to load each log entry line from log
> file
> > and for each generate multiple lines? That is, just one table created. I
> > don't want to have one table and then call explode() to get multiple
> lines.
> > Otherwise, alternative way is to use streaming on loaded table to turn it
> > into another one with no need to customize a serde. So, yeah, the goal
> here
> > is to see how a serde can do this stuff.
> >
> > Thanks!
> >
> > On Wed, Mar 30, 2011 at 12:03 PM, Edward Capriolo <edlinuxguru@gmail.com
> >
> > wrote:
> >>
> >> On Wed, Mar 30, 2011 at 2:55 PM, Michael Jiang <it.mjjiang@gmail.com>
> >> wrote:
> >> > Want to extend RegexSerDe to parse apache web log: for each log entry,
> >> > need
> >> > to convert it into multiple entries. This is easy in streaming. But
> new
> >> > to
> >> > serde, wondering if it is doable and how? Thanks!
> >> >
> >>
> >> You can have your serde produce list<struct> and then explode() them.
> >
> >
>
> The role of SerDe is to take the output from the InputFormat and use
> the information inside the metastore to decode it. As a result this is
> not a good fit for a spot to turn a single row into multiple rows.
>
> What I am suggesting is define a column like this
>
> create table ...( id int, list<String> log_entries) RowFormat serde....
>
> Make sure your serde decodes and populates log_entires.
>
> From there you can use lateral view and explode
> http://wiki.apache.org/hadoop/Hive/LanguageManual/LateralView to turn
> the list<String> into rows.
>
>
> Edward
>

Mime
View raw message