drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charles Givre (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4955) Log Parser for Drill
Date Fri, 21 Oct 2016 04:53:59 GMT
Charles Givre created DRILL-4955:

             Summary: Log Parser for Drill
                 Key: DRILL-4955
                 URL: https://issues.apache.org/jira/browse/DRILL-4955
             Project: Apache Drill
          Issue Type: New Feature
          Components: Storage - Text & CSV
    Affects Versions: 1.9.0
            Reporter: Charles Givre
             Fix For: Future

I've been experimenting with a generic log parser for Drill.  The basic concept is that if
you wanted Drill to ingest log files such as this MySQL log:
070823 21:00:32       1 Connect     root@localhost on test1
070823 21:00:48       1 Query       show tables
070823 21:00:56       1 Query       select * from category
070917 16:29:01      21 Query       select * from location
070917 16:29:12      21 Query       select * from location where id = 1 LIMIT 1

You probably could do it with the various string manipulation methods such as split, substring
etc. but you'd end up with some ugly and very complex queries.

The extension I've built allows you to supply Drill with a regex for the formatting and a
list of fields as shown below.

"log": {
      "type": "log",
      "extensions": [
      "fieldNames": [
      "pattern": "(\\d{6})\\s(\\d{2}:\\d{2}:\\d{2})\\s+(\\d+)\\s(\\w+)\\s+(.+)"

You can then query this log files in this format in Drill.  I'd like to submit this for inclusion
in Drill if there is interest.

This message was sent by Atlassian JIRA

View raw message