kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: Long text and complex data types support
Date Mon, 09 Sep 2019 15:07:51 GMT
Hi Grant,

thanks for responding!

Oracle has CLOBs and BLOBs, MS SQL has varchar(max) and binary. I believe
SnowFlake and Redshift have similar data types.

In healthcare, a lot of good data is trapped in physician notes, progress
reports, discharge summaries etc. and it takes time for specially trained
people (medical coders and abstractors) to read these reports and structure
them (assign billing codes, classify procedures and diagnosis etc.) Some
things will never get coded and trapped in a text.

Another example in healthcare is patient satisfaction surveys with free
text comments.

As for complex data types, we recently had a small project, ingesting FHIR
bundles which are highly nested and complex json data sets. Just go to FHIR
HL7 org site to see examples. This is one of the easiest to comprehend FHIR
document sample:

We ended up using Hive to store them and Spark to get meaningful data but
data is mutable and lot of rows need to be updated/deleted daily which is
painful with Hive.

Hope it helps.

On Sun, Sep 8, 2019 at 6:17 PM Grant Henke <ghenke@cloudera.com> wrote:

> Hi Boris,
> Can you describe in more detail what exactly you are looking for in a long
> text type? Is there another database that has an equivalent type for
> reference?
> I have started looking at complex type support and plan to put up a design
> document soon. No estimates on when it would be complete or how much work
> is required exists yet. Do you have any sample schemas with complex types
> you could send me to help inform designs and trade offs?
> Thank you,
> Grant
> On Sat, Sep 7, 2019 at 11:43 AM Boris Tyukin <boris@boristyukin.com>
> wrote:
>> Hi guys,
>> Any plans to support long text type in Kudu? We would love to use Kudu
>> with other projects but unfortunately long text data are pretty common in
>> healthcare industry and we have to use hive/Impala/hdfs instead which is
>> quite painful since we cannot do in place updates and deletes.
>> Same question about complex types (arrays, maps etc.)
>> Thanks
> --
> Grant Henke
> Software Engineer | Cloudera
> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

View raw message