hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Steinbach (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1483) Update AWS S3 log format deserializer regex
Date Fri, 23 Jul 2010 18:31:50 GMT
Update AWS S3 log format deserializer regex
-------------------------------------------

                 Key: HIVE-1483
                 URL: https://issues.apache.org/jira/browse/HIVE-1483
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Serializers/Deserializers
    Affects Versions: 0.5.0, 0.6.0, 0.7.0
            Reporter: Carl Steinbach


>From HIVE-693:

Amazon Changed the format of the logs at the beginning of February 2010, so now the new regex
is:
static Pattern regexpat = Pattern.compile( "(
S+) (
S+) \\[(.?)
] (
S+) (
S+) (
S+) (
S+) (
S+) \"(.)\" (
S) (
S+) (
S+) (
S+) (
S+) (
S+) \"(.)\" \"(.*)\"(?: -)?");

(the only difference is addition of (?: -)? at the end.

Since Amazon hasn't yet documented the last field, I don't know if it is ok to do a catch-all
regex for that field instead of the very specific one I've added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message