kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Any plans for "Aggregation Push down" or integrating Impala + Kudu more tightly?
Date Thu, 29 Jun 2017 18:53:12 GMT
Hey Jason,

Answers inline below

On Thu, Jun 29, 2017 at 2:52 AM, Jason Heo <jason.heo.sde@gmail.com> wrote:

> Hi,
> Q1.
> After reading Druid vs Kudu
> <http://druid.io/docs/latest/comparisons/druid-vs-kudu.html>, I wondered
> Druid has aggregation push down.
> *Druid includes its own query layer that allows it to push down
>> aggregations and computations directly to data nodes for faster query
>> processing. *
> If I understand "Aggregation Push down" correctly, it seems that partial
> aggregation is done by data node side, so that only small amount of result
> set can be transferred to a client which could lead to great performance
> gain. (Am I right?)

That's right.

> So I wanted to know if Apache Kudu has a plan for Aggregation push down
> scan feature (Or already has it)

It currently doesn't have it, and there aren't any current plans to do so.

Usually, we assume that Kudu tablet servers are collocated with either
Impala daemons or Spark executors. So, it's less important to provide
pushdown into the tablet server itself, since even without it, we are
typically avoiding any network transfer from the TS into the execution
environment which does the aggregation.

The above would be less true if there were some way in which Kudu itself
could perform the aggregation more efficiently based on knowledge of its
underlying storage. For example, one could imagine doing a GROUP BY
'foo_col' more efficiently within Kudu if the column is dictionary-encoded
by aggregating on the code-words rather than the resulting decoded strings,
since the integer code words are fixed length and faster to hash, compare,

That said, it hasn't been a high priority relative to other performance
areas we're exploring.

> Q2.
> One thing that I concern when using Impala+Kudu is that all matching rows
> should transferred to impala process from kudu tserver. Usually Impala and
> Kudu tserver run on same node. So It would be happy If Impala can read Kudu
> Tablet directly. Any plan for this kind of features?
> How-to: Use Impala and Kudu Together for Analytic Workloads
> <https://blog.cloudera.com/blog/2016/04/how-to-use-impala-and-kudu-together-for-analytic-workloads/>
> says that:
> *we intend to implement the Apache Arrow in-memory data format and to
>> share memory between Kudu and Impala, which we expect will help with
>> performance and resource usage.*
> What does "share memory between Kudu and Impala"? Does this already
> implemented?

Yes, currently all matching rows are transferred from the Kudu TS to the
Impala daemon. Impala schedules scanners for locality, though, so this is a
localhost-scoped connection which is quite fast. To give you a sense of the
speed, I just tested a localhost TCP connection using 'iperf' and measured
~6GB/sec on a single core. Although this is significantly slower than a
within-process memcpy, it's still fast enough that it usually represents a
small fraction of the overall CPU consumption of a query.

Regarding sharing memory, the first step which is already implemented is to
share a common in-memory layout. That is to say, the format in which Kudu
returns rows over the wire to the client matches the same format that
Impala expects its rows in memory. So, when it receives a block of rows in
the scanner, it doesn't have to "parse" or "convert" them to some other
format. It can simply interpret the data in place. This saves a lot of CPU.

Using something like Arrow would be even more efficient than the current
format since it is columnar rather than row-oriented. However, Impala
currently does not use a columnar format for its operator pipeline, so we
can't currently make use of Arrow to optimize the interchange.

Currently, as mentioned above, the data (in the common format) is
transferred from Kudu to Impala via a localhost TCP socket. A few years ago
we had an intern who experimented with using a Unix domain socket and found
some small speedup. He also experimented with setting up a shared memory
region and also found another small speedup over the domain socket.
However, there was a lot of complexity involved in this code (particularly
the shared memory approach) relative to the gain that we saw, so we didn't
end up merging it before his internship ended :)

Todd Lipcon
Software Engineer, Cloudera

View raw message