pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niels Basjes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4639) Add better parser for Apache HTTPD access log.
Date Thu, 06 Aug 2015 10:33:05 GMT

    [ https://issues.apache.org/jira/browse/PIG-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659813#comment-14659813

Niels Basjes commented on PIG-4639:

I understand your concern.
Of course you _can_ include it in pig because it is Apache licensed...


I discussed the question if something like this should be added to PIG itself with [~alangates]
at a conference in Amsterdam (long ago). The design goal of this library is that is is usable
from a multitude of tools and PIG just happens to be one of them. Apache Drill is currently
working on including this parser in a similar way as I did in this patch : DRILL-3423

So I think adding it to PIG seems like the wrong place.

Viable options I see right now:
# I simply promise you to accept bug fixes. As a safe guard you retain a copy of the code
base on your own system.
# Convert this into an Apache project. Either making it part of an existing project or creating
a new project (maybe "Apache Commons HttpLogParser" ?). That would mean that other people
need to join in.
# Revert this patch and go for PIG-4417 where a user who needs this can simply do this to
download it directly from maven central (i.e. the dependency is only there if a user chooses
{{REGISTER ivy://nl.basjes.parse.httpdlog:httpdlog-pigloader:2.1.1}}
In this case I think simply adding some documentation near the existing 'RegEx' logparser
parsers pointing people towards the 'externally hosted alternative' would help.
# Revert this patch and simply "Won't Fix".

What options do you see as valid for this case?

> Add better parser for Apache HTTPD access log.
> ----------------------------------------------
>                 Key: PIG-4639
>                 URL: https://issues.apache.org/jira/browse/PIG-4639
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.15.0
>            Reporter: Niels Basjes
>            Assignee: Niels Basjes
>             Fix For: 0.16.0
>         Attachments: PIG-4639-20150723-classnotfound.patch, PIG-4639-20150725.patch,
> Currently there are two parsers for Apache HTTPD acces log files in piggybank that only
allow parsing the 'combined' and 'common' logformats. These two also only parse the 'basics'.
> This is proposed patch to add the existing https://github.com/nielsbasjes/logparser (Apache
2.0 license) as an 'out of the box' parser to piggybank. 
> This parser parses the logfile using the LogFormat specification used to writte it. Almost
all LogFormat specifiers are supported and as such adds easy parsing capabilities for (almost)
all custom logformats used in production scenarios. 
> This parser also goes much deeper in the sense that it allows extracting things like
the value of a cookie or the value of a  query string parameter.

This message was sent by Atlassian JIRA

View raw message