drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Resnick <cre...@gmail.com>
Subject Re: Question about Kudu Storage and storages in general
Date Sat, 22 Jul 2017 03:38:23 GMT
Hi Gautam,

Though I did get the filter pushdown to kudu working I unfortunately
encountered sporadic Drill errors when performing aggregate queries. The
most common error was:

 java.lang.IllegalStateException: Failure while reading vector.  Expected
vector class of org.apache.drill.exec.vector.NullableIntVector but was
holding vector class org.apache.drill.exec.vector.BigIntVector, field=
campaign_id(BIGINT:REQUIRED)
at
org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:321)
at
org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:179)
which seems to match this longstanding Jira
https://issues.apache.org/jira/browse/DRILL-2926

The other error I would sometimes get was Error: UNSUPPORTED_OPERATION
ERROR: Sort doesn't currently support sorts with changing schemas.

It seems to be related to the order fragments arrive because the exact same
queries will sometimes work, sometimes fail with one of the above. The
queries are simple sum/group-bys on two fields. The unaggregated version of
the same queries always work and return the full normal data set, so what's
up with the aggregates? I'm hoping someone can suggest a possible reason
for this weirdness because I think Kudu + Drill has great potential,
especially with the predicate pushdown.

-Cliff




On Fri, Jul 21, 2017 at 2:36 PM, Gautam Parai <gparai@mapr.com> wrote:

> Hi Cliff,
>
>
> Thanks and let us know how it goes on the colocated cluster. It would be
> great if you could share your usecase with the community.
>
>
> We always look forward to contributions from the community.
>
>
> Gautam
>
> ________________________________
> From: Clifford Resnick <cresnick@mediamath.com>
> Sent: Thursday, July 20, 2017 5:27:42 PM
> To: dev@drill.apache.org
> Subject: Re: Question about Kudu Storage and storages in general
>
> Ok, I discovered RelOptRuls. Kudu storage now works for me using Kudu 1.4.
> With the ScanToken api it looks like it’s actually doing full predicate
> pushdown. I’m just running Drill embedded for now but tomorrow I’ll try on
> a colocated cluster.
>
> -Cliff
>
> From: Clifford Resnick <cresnick@mediamath.com<mailto:
> cresnick@mediamath.com>>
> Date: Thursday, July 20, 2017 at 12:49 PM
> To: "dev@drill.apache.org<mailto:dev@drill.apache.org>" <
> dev@drill.apache.org<mailto:dev@drill.apache.org>>
> Subject: Question about Kudu Storage and storages in general
>
> I’m trying out the Kudu Storage with Kudu 1.4 without success, always
> getting a kudu error -> “Invalid scan start key: Error decoding composite
> key component ‘my-key': key too short: <redacted>”
>
> I know this storage is experimental, but I’m hoping to get it to work.
> Looking at the code I noticed it’s based on a deprecated way of locating
> Kudu Tablets for scans, doing a general location mapping of Tablet Servers,
> I suppose to better colocate Drillbit scans. Instead of this, Kudu now
> recommends using the KuduScanToken api:
>
> " A scan token describes a partial scan of a Kudu table limited to a single
>  contiguous physical location. Using the {@link KuduScanTokenBuilder},
> clients can
>  describe the desired scan, including predicates, bounds, timestamps, and
>  caching, and receive back a collection of scan tokens.
>
>  Each scan token may be separately turned into a scanner using
>  {@link #intoScanner}, with each scanner responsible for a disjoint section
>  of the table.
>
>  Scan tokens may be serialized using the {@link #serialize} method and
>  deserialized back into a scanner using the {@link #deserializeIntoScanner}
>  method. This allows use cases such as generating scan tokens in the
> planner
>  component of a query engine, then sending the tokens to execution nodes
> based
>  on locality, and then instantiating the scanners on those nodes.
>
>  Scan token locality information can be inspected using the {@link
> #getTablet}
>  method.”
>
> I’m new to Drill, but it seemed to me that this api could be retro-fitted
> to good effect into KuduGroupScan, but I didn’t get very far given that I
> couldn’t even suss out where the predicates were in the drill code. Unless
> I completely misunderstand the concept, it seems the Kudu Storage must be
> pushing the predicate to lower code levels and is therefore not exposed to
> them. If you read the above, my hope is that there is a way to serialize
> the Kudu ScanTokens to Drillbits to be used as scanners. Does anyone know
> if this is possible using the Drill execution path? If so can someone
> please point me to documentation/examples/tests I can consult to help
> clarify my muddled understanding of Drill?
>
> -Cliff
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message