kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: INT128 Column Support Interest
Date Thu, 16 Nov 2017 22:54:16 GMT
On Thu, Nov 16, 2017 at 2:28 PM, Dan Burkert <danburkert@apache.org> wrote:

> I think it would be useful.  As far as I've seen the main costs in
> carrying data types are in writing performant encoders, and updating
> integrations to work with them.  I'm guessing with 128 bit integers there
> would be some integrations that can't or won't support it, which might be a
> cause for confusion.  Overall, though, I think the upsides of efficiency
> and decreased storage space are compelling.   Do you have a sense yet of
> what encodings are going to be supported down the road (will we get to full
> parity with 32/64)?
>

Yea, my concerns are:

1) Integrations: do we have a compatible SQL type to map this to in Spark
SQL, Impala, Presto, etc? What type would we map to in Java? It seems like
the most natural mapping would be DECIMAL(39) or somesuch in SQL. So, if
we're going to map it the same as decimal anyway, why not just _not_ expose
it and only expose decimal? If someone wants to store a 128-bit hash as a
DECIMAL(39) they are free to, of course. Postgres's built-in int types only
go up to 64-bit (bigint)

In addition to the choice of DECIMAL, for things like fixed-length binary
maybe we are better off later adding a fixed-length BINARY type, like
BINARY(16) which could be used for storing large hashes? There is precedent
for fixed-length CHAR(n) in SQL, but no such precedent for int128.


2) Encoders: like Dan mentioned, it seems like we might not be able to do a
very efficient job of encoding these very large integers. Stuff like
bitshuffle, SIMD bitpacking, etc, isn't really designed for such large
values. So, I'm a little afraid that we'll end up only with PLAIN and
people will be upset with the storage overhead and performance.

-Todd

>
> On Thu, Nov 16, 2017 at 2:19 PM, Grant Henke <ghenke@cloudera.com> wrote:
>
>> Hi all,
>>
>> As a part of adding DECIMAL support to Kudu it was necessary to add
>> internal support for 128 bit integers. Taking that one step further and
>> supporting public columns and APIs for 128 bit integers would not be too
>> much additional work. However, I wanted to gauge the interest from the
>> community.
>>
>> My initial thoughts are that having an INT128 column type could be useful
>> for things like UUIDs, IPv6 addresses, MD5 hashes and other similar types
>> of data.
>>
>> Is there any interest or uses for a INT128 column type? Is anyone
>> currently using a STRING or BINARY column for 128 bit data?
>>
>> Thank you,
>> Grant
>> --
>> Grant Henke
>> Software Engineer | Cloudera
>> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message