impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Behm <alex.b...@cloudera.com>
Subject Re: Is there a plan to implement histogram support for computing column stats?
Date Wed, 21 Jun 2017 04:28:02 GMT
Histograms can definitely be useful in getting more accurate cardinality
estimates to improve plan choices.

However, adding support for histograms has several challenges which is why
we have no concrete plans on supporting them yet:
- computing stats is already a huge pain for most users due to its cost;
adding histograms might make this problem worse
- perhaps we should support stats on a subset of columns to make it
cheaper; but most users would have difficulties deciding the subset of
columns to pick, so we'd need to provide an automated solution for
suggesting the subset
- there is no support in the Hive Metastore so we'd need to store them
inside the generic TBLPROPERTIES map or similar

Just trying to explain that the user experience needs to be considered when
adding such a new feature, and that dealing with all caveats could be a
substantial amount of design and implementation work.


On Tue, Jun 20, 2017 at 8:53 PM, 吴朱华 <ikewu83@gmail.com> wrote:

> let me check it out^_^
>
> 2017-06-21 0:25 GMT+08:00 Jim Apple <jbapple@cloudera.com>:
>
> > THis looks like the closest ticket to the question:
> >
> > https://issues.apache.org/jira/browse/IMPALA-2416
> >
> > Feel free to file another, more ambitious, ticket if you'd like.
> >
> > On Tue, Jun 20, 2017 at 4:10 AM, 吴朱华 <ikewu83@gmail.com> wrote:
> > > Hi guys:
> > >
> > > Is there a plan to implement the histogram support for computing column
> > > stats? Base on my assumption, if  the histogram support implements, it
> > will
> > > easily and more accurate to  predict the join involved row numbers, and
> > > which will make a better decision for choosing the shuffle or the
> > broadcast.
> > > Above all is all my amateur thoughts, I love to hear your feedbacks^_^
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message