kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: Long text and complex data types support
Date Mon, 09 Sep 2019 18:02:39 GMT
I cannot explain why, because it is the vendors who design EMR/EHR systems
and databases to support them, but in our case (Cerner EHR) they have text
fields taking many MBs, we've seen as high as 64Mb (again, do not ask me
why :))

On Mon, Sep 9, 2019 at 12:36 PM Grant Henke <ghenke@cloudera.com> wrote:

> Oracle has CLOBs and BLOBs, MS SQL has varchar(max) and binary. I believe
>> SnowFlake and Redshift have similar data types.
>>
>
> Today Kudu has support for String columns which can hold up to 64KB
> of UTF-8 encoded characters. I assume you are asking because that limit is
> too small. How large would these text columns need to be?
>
>
>
>
>
> On Mon, Sep 9, 2019 at 10:09 AM Boris Tyukin <boris@boristyukin.com>
> wrote:
>
>> Hi Grant,
>>
>> thanks for responding!
>>
>> Oracle has CLOBs and BLOBs, MS SQL has varchar(max) and binary. I believe
>> SnowFlake and Redshift have similar data types.
>>
>> In healthcare, a lot of good data is trapped in physician notes, progress
>> reports, discharge summaries etc. and it takes time for specially trained
>> people (medical coders and abstractors) to read these reports and structure
>> them (assign billing codes, classify procedures and diagnosis etc.) Some
>> things will never get coded and trapped in a text.
>>
>> Another example in healthcare is patient satisfaction surveys with free
>> text comments.
>>
>> As for complex data types, we recently had a small project, ingesting
>> FHIR bundles which are highly nested and complex json data sets. Just go to
>> FHIR HL7 org site to see examples. This is one of the easiest to
>> comprehend FHIR document sample:
>> https://www.hl7.org/fhir/patient-example.json.html
>>
>> We ended up using Hive to store them and Spark to get meaningful data but
>> data is mutable and lot of rows need to be updated/deleted daily which is
>> painful with Hive.
>>
>> Hope it helps.
>>
>> On Sun, Sep 8, 2019 at 6:17 PM Grant Henke <ghenke@cloudera.com> wrote:
>>
>>> Hi Boris,
>>>
>>> Can you describe in more detail what exactly you are looking for in a
>>> long text type? Is there another database that has an equivalent type for
>>> reference?
>>>
>>> I have started looking at complex type support and plan to put up a
>>> design document soon. No estimates on when it would be complete or how much
>>> work is required exists yet. Do you have any sample schemas with complex
>>> types you could send me to help inform designs and trade offs?
>>>
>>> Thank you,
>>> Grant
>>>
>>> On Sat, Sep 7, 2019 at 11:43 AM Boris Tyukin <boris@boristyukin.com>
>>> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> Any plans to support long text type in Kudu? We would love to use Kudu
>>>> with other projects but unfortunately long text data are pretty common in
>>>> healthcare industry and we have to use hive/Impala/hdfs instead which is
>>>> quite painful since we cannot do in place updates and deletes.
>>>>
>>>> Same question about complex types (arrays, maps etc.)
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>> --
>>> Grant Henke
>>> Software Engineer | Cloudera
>>> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>>>
>>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>

Mime
View raw message