kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <danburk...@apache.org>
Subject Re: INT128 Column Support Interest
Date Thu, 16 Nov 2017 23:30:47 GMT
Aren't we going to need efficient encodings in order to make decimal work
well, anyway?

- Dan

On Thu, Nov 16, 2017 at 2:54 PM, Todd Lipcon <todd@cloudera.com> wrote:

> On Thu, Nov 16, 2017 at 2:28 PM, Dan Burkert <danburkert@apache.org>
> wrote:
>
> > I think it would be useful.  As far as I've seen the main costs in
> > carrying data types are in writing performant encoders, and updating
> > integrations to work with them.  I'm guessing with 128 bit integers there
> > would be some integrations that can't or won't support it, which might
> be a
> > cause for confusion.  Overall, though, I think the upsides of efficiency
> > and decreased storage space are compelling.   Do you have a sense yet of
> > what encodings are going to be supported down the road (will we get to
> full
> > parity with 32/64)?
> >
>
> Yea, my concerns are:
>
> 1) Integrations: do we have a compatible SQL type to map this to in Spark
> SQL, Impala, Presto, etc? What type would we map to in Java? It seems like
> the most natural mapping would be DECIMAL(39) or somesuch in SQL. So, if
> we're going to map it the same as decimal anyway, why not just _not_ expose
> it and only expose decimal? If someone wants to store a 128-bit hash as a
> DECIMAL(39) they are free to, of course. Postgres's built-in int types only
> go up to 64-bit (bigint)
>
> In addition to the choice of DECIMAL, for things like fixed-length binary
> maybe we are better off later adding a fixed-length BINARY type, like
> BINARY(16) which could be used for storing large hashes? There is precedent
> for fixed-length CHAR(n) in SQL, but no such precedent for int128.
>
>
> 2) Encoders: like Dan mentioned, it seems like we might not be able to do a
> very efficient job of encoding these very large integers. Stuff like
> bitshuffle, SIMD bitpacking, etc, isn't really designed for such large
> values. So, I'm a little afraid that we'll end up only with PLAIN and
> people will be upset with the storage overhead and performance.
>
> -Todd
>
> >
> > On Thu, Nov 16, 2017 at 2:19 PM, Grant Henke <ghenke@cloudera.com>
> wrote:
> >
> >> Hi all,
> >>
> >> As a part of adding DECIMAL support to Kudu it was necessary to add
> >> internal support for 128 bit integers. Taking that one step further and
> >> supporting public columns and APIs for 128 bit integers would not be too
> >> much additional work. However, I wanted to gauge the interest from the
> >> community.
> >>
> >> My initial thoughts are that having an INT128 column type could be
> useful
> >> for things like UUIDs, IPv6 addresses, MD5 hashes and other similar
> types
> >> of data.
> >>
> >> Is there any interest or uses for a INT128 column type? Is anyone
> >> currently using a STRING or BINARY column for 128 bit data?
> >>
> >> Thank you,
> >> Grant
> >> --
> >> Grant Henke
> >> Software Engineer | Cloudera
> >> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> >>
> >
> >
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message