drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gautam Parai <gpa...@mapr.com>
Subject Re: Question about Kudu Storage and storages in general
Date Fri, 21 Jul 2017 18:36:31 GMT
Hi Cliff,

Thanks and let us know how it goes on the colocated cluster. It would be great if you could
share your usecase with the community.

We always look forward to contributions from the community.


From: Clifford Resnick <cresnick@mediamath.com>
Sent: Thursday, July 20, 2017 5:27:42 PM
To: dev@drill.apache.org
Subject: Re: Question about Kudu Storage and storages in general

Ok, I discovered RelOptRuls. Kudu storage now works for me using Kudu 1.4. With the ScanToken
api it looks like it’s actually doing full predicate pushdown. I’m just running Drill
embedded for now but tomorrow I’ll try on a colocated cluster.


From: Clifford Resnick <cresnick@mediamath.com<mailto:cresnick@mediamath.com>>
Date: Thursday, July 20, 2017 at 12:49 PM
To: "dev@drill.apache.org<mailto:dev@drill.apache.org>" <dev@drill.apache.org<mailto:dev@drill.apache.org>>
Subject: Question about Kudu Storage and storages in general

I’m trying out the Kudu Storage with Kudu 1.4 without success, always getting a kudu error
-> “Invalid scan start key: Error decoding composite key component ‘my-key': key too
short: <redacted>”

I know this storage is experimental, but I’m hoping to get it to work. Looking at the code
I noticed it’s based on a deprecated way of locating Kudu Tablets for scans, doing a general
location mapping of Tablet Servers, I suppose to better colocate Drillbit scans. Instead of
this, Kudu now recommends using the KuduScanToken api:

" A scan token describes a partial scan of a Kudu table limited to a single
 contiguous physical location. Using the {@link KuduScanTokenBuilder}, clients can
 describe the desired scan, including predicates, bounds, timestamps, and
 caching, and receive back a collection of scan tokens.

 Each scan token may be separately turned into a scanner using
 {@link #intoScanner}, with each scanner responsible for a disjoint section
 of the table.

 Scan tokens may be serialized using the {@link #serialize} method and
 deserialized back into a scanner using the {@link #deserializeIntoScanner}
 method. This allows use cases such as generating scan tokens in the planner
 component of a query engine, then sending the tokens to execution nodes based
 on locality, and then instantiating the scanners on those nodes.

 Scan token locality information can be inspected using the {@link #getTablet}

I’m new to Drill, but it seemed to me that this api could be retro-fitted to good effect
into KuduGroupScan, but I didn’t get very far given that I couldn’t even suss out where
the predicates were in the drill code. Unless I completely misunderstand the concept, it seems
the Kudu Storage must be pushing the predicate to lower code levels and is therefore not exposed
to them. If you read the above, my hope is that there is a way to serialize the Kudu ScanTokens
to Drillbits to be used as scanners. Does anyone know if this is possible using the Drill
execution path? If so can someone please point me to documentation/examples/tests I can consult
to help clarify my muddled understanding of Drill?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message