kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Henke <ghe...@cloudera.com>
Subject Re: Long text and complex data types support
Date Mon, 09 Sep 2019 16:36:28 GMT
>
> Oracle has CLOBs and BLOBs, MS SQL has varchar(max) and binary. I believe
> SnowFlake and Redshift have similar data types.
>

Today Kudu has support for String columns which can hold up to 64KB
of UTF-8 encoded characters. I assume you are asking because that limit is
too small. How large would these text columns need to be?





On Mon, Sep 9, 2019 at 10:09 AM Boris Tyukin <boris@boristyukin.com> wrote:

> Hi Grant,
>
> thanks for responding!
>
> Oracle has CLOBs and BLOBs, MS SQL has varchar(max) and binary. I believe
> SnowFlake and Redshift have similar data types.
>
> In healthcare, a lot of good data is trapped in physician notes, progress
> reports, discharge summaries etc. and it takes time for specially trained
> people (medical coders and abstractors) to read these reports and structure
> them (assign billing codes, classify procedures and diagnosis etc.) Some
> things will never get coded and trapped in a text.
>
> Another example in healthcare is patient satisfaction surveys with free
> text comments.
>
> As for complex data types, we recently had a small project, ingesting FHIR
> bundles which are highly nested and complex json data sets. Just go to FHIR
> HL7 org site to see examples. This is one of the easiest to comprehend FHIR
> document sample:
> https://www.hl7.org/fhir/patient-example.json.html
>
> We ended up using Hive to store them and Spark to get meaningful data but
> data is mutable and lot of rows need to be updated/deleted daily which is
> painful with Hive.
>
> Hope it helps.
>
> On Sun, Sep 8, 2019 at 6:17 PM Grant Henke <ghenke@cloudera.com> wrote:
>
>> Hi Boris,
>>
>> Can you describe in more detail what exactly you are looking for in a
>> long text type? Is there another database that has an equivalent type for
>> reference?
>>
>> I have started looking at complex type support and plan to put up a
>> design document soon. No estimates on when it would be complete or how much
>> work is required exists yet. Do you have any sample schemas with complex
>> types you could send me to help inform designs and trade offs?
>>
>> Thank you,
>> Grant
>>
>> On Sat, Sep 7, 2019 at 11:43 AM Boris Tyukin <boris@boristyukin.com>
>> wrote:
>>
>>> Hi guys,
>>>
>>> Any plans to support long text type in Kudu? We would love to use Kudu
>>> with other projects but unfortunately long text data are pretty common in
>>> healthcare industry and we have to use hive/Impala/hdfs instead which is
>>> quite painful since we cannot do in place updates and deletes.
>>>
>>> Same question about complex types (arrays, maps etc.)
>>>
>>> Thanks
>>>
>>
>>
>> --
>> Grant Henke
>> Software Engineer | Cloudera
>> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>>
>

-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Mime
View raw message