[ https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649404#comment-14649404
]
Jacques Nadeau commented on DRILL-3423:
---------------------------------------
Q1: I should provide better comments in the code. Vector memory allocations work on powers
of 2. VarChar uses n+1 slots when allocating data. As such, if we make batches 4095 in size,
then varchar allocations will be 4096 in size and we will have minimal wastage due to power
2 rounding. If we chose 4096, then varchar allocations would be 4097 and thus the underlying
memory allocation would be 8192 with virtually half of that wasted.
Q2: My plan was actually to write a blog post around this plugin so people could use it as
a model. (One of the reasons I actually kept in a single file.) I wanted to get something
up for feedback but will be working on adding javadocs to clarify things.
Q3: Good point. We should implement a new FormatMatcher for access logs that recognizes this
pattern. Can you provide a couple of examples and maybe propose a format matching algorithm?
> Add New HTTPD format plugin
> ---------------------------
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
> Issue Type: New Feature
> Components: Storage - Other
> Reporter: Jacques Nadeau
> Assignee: Jacques Nadeau
> Fix For: 1.2.0
>
>
> Add an HTTPD logparser based format plugin. The author has been kind enough to move
the logparser project to be released under the Apache License. Can find it here:
> <dependency>
> <groupId>nl.basjes.parse.httpdlog</groupId>
> <artifactId>httpdlog-parser</artifactId>
> <version>2.0</version>
> </dependency>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|