drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gautam Parai <gpa...@maprtech.com>
Subject Re: median, quantile
Date Tue, 07 Jun 2016 20:39:48 GMT
As Julian mentioned, an optional APPROXIMATE clause along with a
session/system setting looks like the best option to me. Exposing the
algorithm in the name does not make sense - we might want to replace it
with a new one in the future. However, there might be different approaches
e.g. Oracle uses a different naming convention APPROXIMATE_COUNT_DISTINCT()
[1]

[1] https://docs.oracle.com/database/121/SQLRF/functions013.htm#SQLRF56900

On Tue, Jun 7, 2016 at 6:50 AM, John Omernik <john@omernik.com> wrote:

> Julian, great point.
>
> With a proper design, uses could session variables or use the select with
> options so that one query wouldn't change the session wide settings.   That
> seems promising as an idea.
>
> John
>
> On Mon, Jun 6, 2016 at 8:12 PM, Julian Hyde <jhyde@apache.org> wrote:
>
> > I’ve thought for some time that SQL aggregate functions should have an
> > “APPROXIMATE ( … )” clause. Users don’t WANT to call a TD_MEDIAN
> function,
> > they want the MEDIAN that gives them an answer to their desired accuracy
> > (within X, within Y%, or within a given confidence interval), and
> TD_MEDIAN
> > may be the way to achieve that.
> >
> > In fact the user might just set “SET APPROXIMATE = ’95%'” in their
> session
> > and the APPROXIMATE clause is implicit on every query they write.
> >
> > Approximate aggregate functions are all the rage right now but I’m not
> > aware of any effort standardize them across databases.
> >
> > Julian
> >
> >
> > > On Jun 6, 2016, at 5:58 PM, Parth Chandra <parthc@apache.org> wrote:
> > >
> > > Hey Steven,
> > > Somehow I missed this one when you posted it.
> > > Since you asked, I would suggest a different name from median, quartile
> > > since that might mislead. How about td_median, td_quantile ?
> > >
> > > On Wed, Apr 13, 2016 at 11:51 AM, Steven Phillips <steven@dremio.com>
> > wrote:
> > >
> > >> I submitted a pull request a little while ago that introduces
> > (approximate)
> > >> median and quantile functions using the tdigest library.
> > >>
> > >> https://github.com/apache/drill/pull/456
> > >>
> > >> It would be great if I could get some feedback on this. Specifically,
> > is it
> > >> ok to call these functions median and quantile, given that they are
> not
> > >> exact.
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message