drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek Ravi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5977) predicate pushdown support kafkaMsgOffset
Date Mon, 02 Apr 2018 05:43:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421951#comment-16421951
] 

Abhishek Ravi commented on DRILL-5977:
--------------------------------------

Thank you for review [~akumarb2010]. Yes, you are absolutely right. As an initial approach
to tackle this problem I plan to do the following after obtaining *top-level predicates* in
an expression.
 # Check if condition on {{kafkaMsgTimestamp}} / {{kafkaMsgOffset exists.}}
 # Check if there is no {{OR}}  joining top-level predicates.

Do filter pushdown only when both checks succeed. Does  this sound good?

> predicate pushdown support kafkaMsgOffset
> -----------------------------------------
>
>                 Key: DRILL-5977
>                 URL: https://issues.apache.org/jira/browse/DRILL-5977
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: B Anil Kumar
>            Assignee: Bhallamudi Venkata Siva Kamesh
>            Priority: Major
>             Fix For: 1.14.0
>
>
> As part of Kafka storage plugin review, below is the suggestion from Paul.
> {noformat}
> Does it make sense to provide a way to select a range of messages: a starting point or
a count? Perhaps I want to run my query every five minutes, scanning only those messages since
the previous scan. Or, I want to limit my take to, say, the next 1000 messages. Could we use
a pseudo-column such as "kafkaMsgOffset" for that purpose? Maybe
> SELECT * FROM <some topic> WHERE kafkaMsgOffset > 12345
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message