kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Henke <ghe...@cloudera.com>
Subject Re: Long text and complex data types support
Date Thu, 12 Sep 2019 01:27:17 GMT
Thanks for the information Dmitry and Mauricio!

An example from genomics.
>

Dmitry, Would you be interested in writing up more details about how you
are using Kudu in a blog post or even a mailing list email? This sounds
super interesting.

Supporting serialized objects (e.g. java's hashtables with
> capabilities to select only rows with hashtables containing some
> specific keys) would make Kudu super-special ;)
>

I agree supporting something like this would be very cool.

Would be good if Kudu supported the way Impala can store and query nested
> data
>

Supporting Impala's syntax on Kudu tables with complex types is absolutely
a priority.

Thanks,
Grant

On Wed, Sep 11, 2019 at 7:04 PM Mauricio Aristizabal <mauricio@impact.com>
wrote:

> Would be good if Kudu supported the way Impala can store and query nested
> data in hdfs/parquet, so it would be (at least mostly) transparent to query
> nested data in either storage engine.  We recently had a use for this
> (basically storing N order item details along with each order record) but
> decided against it because we know we'll be moving that table from Parquet
> to Kudu soon.
>
> On Wed, Sep 11, 2019 at 1:49 PM Dmitry Degrave <dmeetry@gmail.com> wrote:
>
>> Hi Grant,
>>
>> An example from genomics. Current scheme is simple [1] (denormalized
>> for performance), but requires N = S * V rows in genotype table (S is
>> number of samples, V is average number of variants in a sample,
>> typical value for WGS V=5*10^6 and we deal with tens of thousands of
>> samples). More optimal scheme would keep all variants of a sample in a
>> single row, which is impossible now.
>>
>> Supporting nested data structures, e.g. similar to implemented in
>> ClickHouse [2], would be useful too.
>>
>> Supporting serialized objects (e.g. java's hashtables with
>> capabilities to select only rows with hashtables containing some
>> specific keys) would make Kudu super-special ;)
>>
>> ~dmitry
>>
>> [1] https://gist.github.com/dnafault/e55ea987c55d2960c738d94e4811d043
>> [2]
>> https://clickhouse-docs.readthedocs.io/en/latest/data_types/nested_data_structures/nested.html
>>
>> On Mon, 9 Sep 2019 at 08:18, Grant Henke <ghenke@cloudera.com> wrote:
>> >
>> > Hi Boris,
>> >
>> > Can you describe in more detail what exactly you are looking for in a
>> long text type? Is there another database that has an equivalent type for
>> reference?
>> >
>> > I have started looking at complex type support and plan to put up a
>> design document soon. No estimates on when it would be complete or how much
>> work is required exists yet. Do you have any sample schemas with complex
>> types you could send me to help inform designs and trade offs?
>> >
>> > Thank you,
>> > Grant
>> >
>> > On Sat, Sep 7, 2019 at 11:43 AM Boris Tyukin <boris@boristyukin.com>
>> wrote:
>> >>
>> >> Hi guys,
>> >>
>> >> Any plans to support long text type in Kudu? We would love to use Kudu
>> with other projects but unfortunately long text data are pretty common in
>> healthcare industry and we have to use hive/Impala/hdfs instead which is
>> quite painful since we cannot do in place updates and deletes.
>> >>
>> >> Same question about complex types (arrays, maps etc.)
>> >>
>> >> Thanks
>> >
>> >
>> >
>> > --
>> > Grant Henke
>> > Software Engineer | Cloudera
>> > grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>>
>
>
> --
> Mauricio Aristizabal
> Architect - Data Pipeline
> mauricio@impact.com | 323 309 4260
> https://impact.com
> <https://www.linkedin.com/company/impact-martech/>
> <https://www.facebook.com/ImpactParTech/>
> <https://twitter.com/impactpartech>
> <https://www.youtube.com/c/impactmartech>
>
>
>
> <http://go.impact.com/WR-PC-AW-DiscoveringGrowthThroughPartnerships.html?utm_medium=owned-email-send&utm_source=sigsatori&utm_campaign=webinarreg-201909-discoveringgrowth-pc>
>


-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Mime
View raw message