drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Scott (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3423) Add New HTTPD format plugin
Date Thu, 05 Nov 2015 15:10:27 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991779#comment-14991779

Jim Scott commented on DRILL-3423:


I'm not sure I follow this comment "We should also avoid the use of dot delimiters being automatically
generated by Drill."

Am I correct that your concern is specifically with the configuration of the plugin and mapping
of the field names.

Here is the problem I have with creating the mappings in the configuration:
1. There are WAY more ways the parser can parse a field than are logical for us to create
mappings for (e.g. a time field will yield timezone based result and a utc based.)
2. By providing a mapping within the drill plugin we have to expose every default for anything
that may show up in the log parser (e.g. if a new feature shows up in the log parser we wouldn't
be able to expose it until we make a change in the plugin).

Regarding wildcard maps of data I can just as easily remove the :map from the end of the field
name. I'm indifferent, really. I put it on there to make it blatantly obvious. 

As for creating maps like this example:
            case "IP:connection.client.ip":
              add(parser, path, writer.rootAsMap().map("client").varChar("ip"));
            case "IP:connection.client.peerip":
              add(parser, path, writer.rootAsMap().map("client").varChar("peer_ip"));
            case "IP:connection.server.ip":
              add(parser, path, writer.rootAsMap().map("server").varChar("ip"));
This model makes it extremely difficult to support mapping of data types. This makes an assumption
that those fields are varChar and nothing else. Also based on the life cycle of creating maps
within Drill I don't think this is the most logical approach to take. Putting the technical
details aside, I as a user don't know that I benefit from nesting the data into maps. While
from a data structure perspective I understand why someone might want to do this, from a query
perspective I think it makes querying the data more difficult.

> Add New HTTPD format plugin
> ---------------------------
>                 Key: DRILL-3423
>                 URL: https://issues.apache.org/jira/browse/DRILL-3423
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Other
>            Reporter: Jacques Nadeau
>            Assignee: Jim Scott
>             Fix For: 1.4.0
> Add an HTTPD logparser based format plugin.  The author has been kind enough to move
the logparser project to be released under the Apache License.  Can find it here:
> <dependency>
>     <groupId>nl.basjes.parse.httpdlog</groupId>
>     <artifactId>httpdlog-parser</artifactId>
>     <version>2.0</version>
> </dependency>

This message was sent by Atlassian JIRA

View raw message