hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HIVE-662) Add a method to parse apache weblogs
Date Fri, 24 Jul 2009 08:44:14 GMT

     [ https://issues.apache.org/jira/browse/HIVE-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zheng Shao resolved HIVE-662.
-----------------------------

    Resolution: Fixed

Fixed as a result of HIVE-167. HIVE-167 adds RegexSerDe which allows us to do the following:

{code}
CREATE TABLE serde_regex(
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*)
(-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
)
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "../data/files/apache.access.log" INTO TABLE serde_regex;
LOAD DATA LOCAL INPATH "../data/files/apache.access.2.log" INTO TABLE serde_regex;

SELECT * FROM serde_regex ORDER BY time;

{code}


> Add a method to parse apache weblogs
> ------------------------------------
>
>                 Key: HIVE-662
>                 URL: https://issues.apache.org/jira/browse/HIVE-662
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Johan Oskarsson
>            Priority: Critical
>             Fix For: 0.4.0
>
>
> Apache weblogs is one of the more common formats for people to parse using Hadoop. Unfortunately
the method provided to process the logs in Hive has some issues and seems to be on it's way
out. See HIVE-519 and comments on HIVE-520. We should replace that method with something that
works better and that can be supported in the future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message