drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-3423) Add New HTTPD format plugin
Date Tue, 17 Nov 2015 21:39:11 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009557#comment-15009557
] 

Jacques Nadeau edited comment on DRILL-3423 at 11/17/15 9:38 PM:
-----------------------------------------------------------------

Here is my alternative proposal: 

With the log format above: 
{code}
"%h %t \"%r\" %>s %b \"%{Referer}i\""
{code}

I propose a user gets the following fields (in order)

remote_host (varchar)
request_receive_time (drill timestamp)
request_method (varchar)
request_uri (varchar)
response_status (int)
response_bytes (bigint)
header_referer

Additionally, I think we should provide two new functions: 

parse_url(varchar url)
parse_url_query(varchar querystring, varchar pairDelimiter, varchar keyValueDelimiter)

parse_url(varchar) would provide an output of map type similar to: 
{code}
{
  protocol: ...,
  user: ...,
  password: ...,
  host: ...,
  port: 
  path: 
  query:
  fragment:
}
{code}

parse_url_query(...) would return an array of key values:
{code}
[
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."}
]
{code}
In response to your proposal: I don't think it makes sense to return many fields for a date
field. Drill already provides functionality to get parts of a date. I also don't think it
makes sense to prefix a field with its datatype, we don't do that anywhere else in Drill.
We should also expose parsing an optional behavior in Drill.  Note also that my proposal substantially
reduces the number of fields exposed to the user. I think this proposal has much better usability
in the context of sql.

If you want to take advantage of the underlying formats capabilities, you can treat that as
a pushdown of a particular function (data part or the url parsing functions above).






was (Author: jnadeau):
Here is my alternative proposal: 

With the log format above: 
"%h %t \"%r\" %>s %b \"%{Referer}i\""

I propose a user gets the following fields (in order)

remote_host (varchar)
request_receive_time (drill timestamp)
request_method (varchar)
request_uri (varchar)
response_status (int)
response_bytes (bigint)
header_referer

Additionally, I think we should provide two new functions: 

parse_url(varchar url)
parse_url_query(varchar querystring, varchar pairDelimiter, varchar keyValueDelimiter)

parse_url(varchar) would provide an output of map type similar to: 
{code}
{
  protocol: ...,
  user: ...,
  password: ...,
  host: ...,
  port: 
  path: 
  query:
  fragment:
}
{code}

parse_url_query(...) would return an array of key values:
{code}
[
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."}
]
{code}
In response to your proposal: I don't think it makes sense to return many fields for a date
field. Drill already provides functionality to get parts of a date. I also don't think it
makes sense to prefix a field with its datatype, we don't do that anywhere else in Drill.
We should also expose parsing an optional behavior in Drill.  Note also that my proposal substantially
reduces the number of fields exposed to the user. I think this proposal has much better usability
in the context of sql.

If you want to take advantage of the underlying formats capabilities, you can treat that as
a pushdown of a particular function (data part or the url parsing functions above).





> Add New HTTPD format plugin
> ---------------------------
>
>                 Key: DRILL-3423
>                 URL: https://issues.apache.org/jira/browse/DRILL-3423
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Other
>            Reporter: Jacques Nadeau
>            Assignee: Jim Scott
>             Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough to move
the logparser project to be released under the Apache License.  Can find it here:
> <dependency>
>     <groupId>nl.basjes.parse.httpdlog</groupId>
>     <artifactId>httpdlog-parser</artifactId>
>     <version>2.0</version>
> </dependency>
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message