drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "B Anil Kumar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5977) predicate pushdown support kafkaMsgOffset
Date Thu, 29 Mar 2018 03:39:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418387#comment-16418387
] 

B Anil Kumar commented on DRILL-5977:
-------------------------------------

[~aravi5] Thanks for looking into this feature and providing the documentation.

 

Your approach looks good to me. But, just to note, in other storage plugin's like Mongo plugin,
we are converting the entire filter condition expression(combination of all predicates) into
Mongo filter. But in the case of Kafka, it is not possible to achieve it.

 

So mostly, we might need apply predicate pushdown only in few cases.
 * If predicates are on *kafkaMsgOffset* and/or *kafkaMsgTimestamp*. 
 * If predicates has AND condition with case 1. Example: select * from topic1 where kafkaMsgTimestamp
> x AND (v1='' OR v2 = '') 

And queries like select * from kafkaMsgTimestamp > x OR eventTimeStamp < y  can result
in full scan.

 

 

> predicate pushdown support kafkaMsgOffset
> -----------------------------------------
>
>                 Key: DRILL-5977
>                 URL: https://issues.apache.org/jira/browse/DRILL-5977
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: B Anil Kumar
>            Assignee: Bhallamudi Venkata Siva Kamesh
>            Priority: Major
>             Fix For: 1.14.0
>
>
> As part of Kafka storage plugin review, below is the suggestion from Paul.
> {noformat}
> Does it make sense to provide a way to select a range of messages: a starting point or
a count? Perhaps I want to run my query every five minutes, scanning only those messages since
the previous scan. Or, I want to limit my take to, say, the next 1000 messages. Could we use
a pseudo-column such as "kafkaMsgOffset" for that purpose? Maybe
> SELECT * FROM <some topic> WHERE kafkaMsgOffset > 12345
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message