drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4779) Kafka storage plugin support
Date Wed, 08 Nov 2017 19:24:02 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244592#comment-16244592

ASF GitHub Bot commented on DRILL-4779:

Github user paul-rogers commented on a diff in the pull request:

    --- Diff: contrib/storage-kafka/README.md ---
    @@ -0,0 +1,230 @@
    +# Drill Kafka Plugin
    +Drill kafka storage plugin allows you to perform interactive analysis using SQL against
Apache Kafka.
    +<h4 id="Supported kafka versions">Supported Kafka Version</h4>
    +Kafka-0.10 and above </p>
    +<h4 id="Supported Message Formats">Message Formats</h4>
    +Currently this plugin supports reading only Kafka messages of type <strong>JSON</strong>.
    +<h4>Message Readers</h4>
    +<p>Message Readers are used for reading messages from Kafka. Type of the MessageReaders
supported as of now are</p>
    +<table style="width:100%">
    +  <tr>
    +    <th>MessageReader</th>
    +    <th>Description</th>
    +    <th>Key DeSerializer</th> 
    +    <th>Value DeSerializer</th>
    +  </tr>
    +  <tr>
    +    <td>JsonMessageReader</td>
    +    <td>To read Json messages</td>
    +    <td>org.apache.kafka.common.serialization.ByteArrayDeserializer</td>

    +    <td>org.apache.kafka.common.serialization.ByteArrayDeserializer</td>
    +  </tr>
    +<h4 id="Plugin Configurations">Plugin Configurations</h4>
    +Drill Kafka plugin supports following properties
    +   <li><strong>kafkaConsumerProps</strong>: These are typical <a
href="https://kafka.apache.org/documentation/#consumerconfigs">Kafka consumer properties</a>.</li>
    +<li><strong>drillKafkaProps</strong>: These are Drill Kafka plugin
properties. As of now, it supports the following properties
    +	<ul>
    +<li><strong>drill.kafka.message.reader</strong>: Message Reader implementation
to use while reading messages from Kafka. Message reader implementaion should be configured
based on message format. Type of message readers
    + <ul>
    + <li>org.apache.drill.exec.store.kafka.decoders.JsonMessageReader</li>
    + </ul>
    +<li><strong>drill.kafka.poll.timeout</strong>: Polling timeout used
by Kafka client while fetching messages from Kafka cluster.</li>
    +<h4 id="Plugin Registration">Plugin Registration</h4>
    +To register the kafka plugin, open the drill web interface. To open the drill web interface,
enter <strong>http://drillbit:8047/storage</strong> in your browser.
    +<p>The following is an example plugin registration configuration</p>
    +  "type": "kafka",
    +  "kafkaConsumerProps": {
    +    "key.deserializer": "org.apache.kafka.common.serialization.ByteArrayDeserializer",
    +    "auto.offset.reset": "earliest",
    +    "bootstrap.servers": "localhost:9092",
    +    "enable.auto.commit": "true",
    +    "group.id": "drill-query-consumer-1",
    +    "value.deserializer": "org.apache.kafka.common.serialization.ByteArrayDeserializer",
    +    "session.timeout.ms": "30000"
    +  },
    +  "drillKafkaProps": {
    +    "drill.kafka.message.reader": "org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
    +    "drill.kafka.poll.timeout": "2000"
    +  },
    +  "enabled": true
    +<h4 id="Abstraction"> Abstraction </h4>
    +<p>In Drill, each Kafka topic is mapped to a SQL table and when a query is issued
on a table, it scans all the messages from the earliest offset to the latest offset of that
topic at that point of time. This plugin automatically discovers all the topics (tables),
to allow you perform analysis without executing DDL statements.
    --- End diff --
    Does it make sense to provide a way to select a range of messages: a starting point or
a count? Perhaps I want to run my query every five minutes, scanning only those messages since
the previous scan. Or, I want to limit my take to, say, the next 1000 messages. Could we use
a pseudo-column such as "kafkaMsgOffset" for that purpose? Maybe
    SELECT * FROM <some topic> WHERE kafkaMsgOffset > 12345

> Kafka storage plugin support
> ----------------------------
>                 Key: DRILL-4779
>                 URL: https://issues.apache.org/jira/browse/DRILL-4779
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Other
>    Affects Versions: 1.11.0
>            Reporter: B Anil Kumar
>            Assignee: B Anil Kumar
>              Labels: doc-impacting
>             Fix For: 1.12.0
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types

This message was sent by Atlassian JIRA

View raw message