Thanks for the information Dmitry and Mauricio!

An example from genomics. 

Dmitry, Would you be interested in writing up more details about how you are using Kudu in a blog post or even a mailing list email? This sounds super interesting. 

Supporting serialized objects (e.g. java's hashtables with
capabilities to select only rows with hashtables containing some
specific keys) would make Kudu super-special ;)

I agree supporting something like this would be very cool.

Would be good if Kudu supported the way Impala can store and query nested data

Supporting Impala's syntax on Kudu tables with complex types is absolutely a priority. 

Thanks,
Grant 

On Wed, Sep 11, 2019 at 7:04 PM Mauricio Aristizabal <mauricio@impact.com> wrote:
Would be good if Kudu supported the way Impala can store and query nested data in hdfs/parquet, so it would be (at least mostly) transparent to query nested data in either storage engine.  We recently had a use for this (basically storing N order item details along with each order record) but decided against it because we know we'll be moving that table from Parquet to Kudu soon.

On Wed, Sep 11, 2019 at 1:49 PM Dmitry Degrave <dmeetry@gmail.com> wrote:
Hi Grant,

An example from genomics. Current scheme is simple [1] (denormalized
for performance), but requires N = S * V rows in genotype table (S is
number of samples, V is average number of variants in a sample,
typical value for WGS V=5*10^6 and we deal with tens of thousands of
samples). More optimal scheme would keep all variants of a sample in a
single row, which is impossible now.

Supporting nested data structures, e.g. similar to implemented in
ClickHouse [2], would be useful too.

Supporting serialized objects (e.g. java's hashtables with
capabilities to select only rows with hashtables containing some
specific keys) would make Kudu super-special ;)

~dmitry

[1] https://gist.github.com/dnafault/e55ea987c55d2960c738d94e4811d043
[2] https://clickhouse-docs.readthedocs.io/en/latest/data_types/nested_data_structures/nested.html

On Mon, 9 Sep 2019 at 08:18, Grant Henke <ghenke@cloudera.com> wrote:
>
> Hi Boris,
>
> Can you describe in more detail what exactly you are looking for in a long text type? Is there another database that has an equivalent type for reference?
>
> I have started looking at complex type support and plan to put up a design document soon. No estimates on when it would be complete or how much work is required exists yet. Do you have any sample schemas with complex types you could send me to help inform designs and trade offs?
>
> Thank you,
> Grant
>
> On Sat, Sep 7, 2019 at 11:43 AM Boris Tyukin <boris@boristyukin.com> wrote:
>>
>> Hi guys,
>>
>> Any plans to support long text type in Kudu? We would love to use Kudu with other projects but unfortunately long text data are pretty common in healthcare industry and we have to use hive/Impala/hdfs instead which is quite painful since we cannot do in place updates and deletes.
>>
>> Same question about complex types (arrays, maps etc.)
>>
>> Thanks
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


--
Mauricio Aristizabal
Architect - Data Pipeline
     





--
Grant Henke