drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5432) Want a memory format for PCAP files
Date Fri, 30 Jun 2017 16:56:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070396#comment-16070396

ASF GitHub Bot commented on DRILL-5432:

Github user paul-rogers commented on the issue:

    Looking at the big picture, Drill should allow specialized plugins such as this one to
exist as independent projects. Users should be able to download the plugin jar, add it to
Drill, and go.
    As we've discussed, Drill has a bit of work before we get there. We can't hold up this
work waiting for a better solution.
    So, please fix the two minor issues you identified. The code will then be ready for a
final quick review and approval.
    Later, once Drill provides the correct framework, I'd suggest that this code move into
a separate Github repo to be maintained by experts in pcap. Frankly, most Drill developers
are familiar with query engines, not pcap (or other specialized formats.)
    The same is true, for example, of the "indexr" and TSDB plugins which are (slowly) working
their way through the review process.
    Summary: please add the package-info file and the comments in utils. We can then give
    Can we do this by, say, July 10? If so, we can likely get this PR into 1.11, if the Release
Manager agrees.

> Want a memory format for PCAP files
> -----------------------------------
>                 Key: DRILL-5432
>                 URL: https://issues.apache.org/jira/browse/DRILL-5432
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Ted Dunning
> PCAP files [1] are the de facto standard for storing network capture data. In security
and protocol applications, it is very common to want to extract particular packets from a
capture for further analysis.
> At a first level, it is desirable to query and filter by source and destination IP and
port or by protocol. Beyond that, however, it would be very useful to be able to group packets
by TCP session and eventually to look at packet contents. For now, however, the most critical
requirement is that we should be able to scan captures at very high speed.
> I previously wrote a (kind of working) proof of concept for a PCAP decoder that did lazy
deserialization and could traverse hundreds of MB of PCAP data per second per core. This compares
to roughly 2-3 MB/s for widely available Apache-compatible open source PCAP decoders.
> This JIRA covers the integration and extension of that proof of concept as a Drill file
> Initial work is available at https://github.com/mapr-demos/drill-pcap-format
> [1] https://en.wikipedia.org/wiki/Pcap

This message was sent by Atlassian JIRA

View raw message