kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Degrave <dmee...@gmail.com>
Subject Re: Long text and complex data types support
Date Wed, 11 Sep 2019 20:49:09 GMT
Hi Grant,

An example from genomics. Current scheme is simple [1] (denormalized
for performance), but requires N = S * V rows in genotype table (S is
number of samples, V is average number of variants in a sample,
typical value for WGS V=5*10^6 and we deal with tens of thousands of
samples). More optimal scheme would keep all variants of a sample in a
single row, which is impossible now.

Supporting nested data structures, e.g. similar to implemented in
ClickHouse [2], would be useful too.

Supporting serialized objects (e.g. java's hashtables with
capabilities to select only rows with hashtables containing some
specific keys) would make Kudu super-special ;)


[1] https://gist.github.com/dnafault/e55ea987c55d2960c738d94e4811d043
[2] https://clickhouse-docs.readthedocs.io/en/latest/data_types/nested_data_structures/nested.html

On Mon, 9 Sep 2019 at 08:18, Grant Henke <ghenke@cloudera.com> wrote:
> Hi Boris,
> Can you describe in more detail what exactly you are looking for in a long text type?
Is there another database that has an equivalent type for reference?
> I have started looking at complex type support and plan to put up a design document soon.
No estimates on when it would be complete or how much work is required exists yet. Do you
have any sample schemas with complex types you could send me to help inform designs and trade
> Thank you,
> Grant
> On Sat, Sep 7, 2019 at 11:43 AM Boris Tyukin <boris@boristyukin.com> wrote:
>> Hi guys,
>> Any plans to support long text type in Kudu? We would love to use Kudu with other
projects but unfortunately long text data are pretty common in healthcare industry and we
have to use hive/Impala/hdfs instead which is quite painful since we cannot do in place updates
and deletes.
>> Same question about complex types (arrays, maps etc.)
>> Thanks
> --
> Grant Henke
> Software Engineer | Cloudera
> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

View raw message