hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Jiang <it.mjji...@gmail.com>
Subject complex apache log extraction?
Date Wed, 30 Mar 2011 00:36:10 GMT
hey guys,

I want to extract some information from an apache web log. It does more than
just extracting fixed fields that appear at certain location such as host
and request. One task is to extract multiple key/value pairs in request
string. For example, in request string, I have parameters like "name.0",
"name.1", ..., "name.n". Here "n" can be any valid non-negative integer.
They may appear anywhere in the request. It's not just to extract each
key/value pair. More than that :) I want to clone the entry line "n" times
if it contains "name.i" n times, each "ith" cloned entry has an extra field
with the value of "name.i".

I can load log and extract request string first into a table. Then write a
script to do streaming to extract "name" key/value and write to stdout "n"
cloned entries. But is there a one step solution to extract them all from
log file and generate multiple entries as well? I know
"org.apache.hadoop.hive.contrib.serde2.RegexSerDe" can load and extract
apache web log. Is it possible to use it for this case? Thanks!

--mj

Mime
View raw message