hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliot West (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-12274) Increase width of columns used for general configuration in the metastore.
Date Wed, 31 Aug 2016 12:44:20 GMT

     [ https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Elliot West updated HIVE-12274:
-------------------------------
    Description: 
h2. Overview
This issue is very similar in principle to HIVE-1364. We are hitting a limit when processing
JSON data that has a large nested schema. The struct definition is truncated when inserted
into the metastore database column {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters
in length.

Given that the purpose of these columns is to hold very loosely defined configuration values
it seems rather limiting to impose such a relatively low length bound. One can imagine that
valid use cases will arise where reasonable parameter/property values exceed the current limit.


h2. Context
These limitations were in by the [patch attributed|https://github.com/apache/hive/commit/c21a526b0a752df2a51d20a2729cc8493c228799]
to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the reason. However,
nowadays the limit can be increased because:
* Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the configuration parameter
{{MAX_STRING_SIZE}} to {{EXTENDED}}. (source)
* Postgres supports a max of 1GB for {{character}} datatype. (source)
* MySQL can support upto 65535 bytes for the entire row. So long as the {{PARAM_KEY}} value
+ {{PARAM_VALUE}} is less than 65535, we should be good. (source)
* SQL Server's {{varchar}} max length is 8000 and can go beyond using "varchar(max)" with
the same limitation as MySQL being 65535 bytes for the entire row. (source)
* Derby's {{varchar}} can be upto 32672 bytes. (source)

h2. Proposal
Can these columns not use CLOB-like types as for example as used by {{TBLS.VIEW_EXPANDED_TEXT}}?
It would seem that suitable type equivalents exist for all targeted database platforms:

* MySQL: {{mediumtext}}
* Postgres: {{text}}
* Oracle: {{CLOB}}
* Derby: {{LONG VARCHAR}}

I'd suggest that the candidates for type change are:

* {{COLUMNS_V2.TYPE_NAME}}
* {{TABLE_PARAMS.PARAM_VALUE}}
* {{SERDE_PARAMS.PARAM_VALUE}}
* {{SD_PARAMS.PARAM_VALUE}}

Finally, will this limitation persist in the work resulting from HIVE-9452?

  was:
h2. Overview
This issue is very similar in principle to HIVE-1364. We are hitting a limit when processing
JSON data that has a large nested schema. The struct definition is truncated when inserted
into the metastore database column {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters
in length.

Given that the purpose of these columns is to hold very loosely defined configuration values
it seems rather limiting to impose such a relatively low length bound. One can imagine that
valid use cases will arise where reasonable parameter/property values exceed the current limit.


h2. Context
These limitations were in by the patch attributed to HIVE-1364 which mentions the _"max length
on Oracle 9i/10g/11g"_ as the reason. However, nowadays the limit can be increased because:
* Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the configuration parameter
{{MAX_STRING_SIZE}} to {{EXTENDED}}. (source)
* Postgres supports a max of 1GB for {{character}} datatype. (source)
* MySQL can support upto 65535 bytes for the entire row. So long as the {{PARAM_KEY}} value
+ {{PARAM_VALUE}} is less than 65535, we should be good. (source)
* SQL Server's {{varchar}} max length is 8000 and can go beyond using "varchar(max)" with
the same limitation as MySQL being 65535 bytes for the entire row. (source)
* Derby's {{varchar}} can be upto 32672 bytes. (source)

h2. Proposal
Can these columns not use CLOB-like types as for example as used by {{TBLS.VIEW_EXPANDED_TEXT}}?
It would seem that suitable type equivalents exist for all targeted database platforms:

* MySQL: {{mediumtext}}
* Postgres: {{text}}
* Oracle: {{CLOB}}
* Derby: {{LONG VARCHAR}}

I'd suggest that the candidates for type change are:

* {{COLUMNS_V2.TYPE_NAME}}
* {{TABLE_PARAMS.PARAM_VALUE}}
* {{SERDE_PARAMS.PARAM_VALUE}}
* {{SD_PARAMS.PARAM_VALUE}}

Finally, will this limitation persist in the work resulting from HIVE-9452?


> Increase width of columns used for general configuration in the metastore.
> --------------------------------------------------------------------------
>
>                 Key: HIVE-12274
>                 URL: https://issues.apache.org/jira/browse/HIVE-12274
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 2.0.0
>            Reporter: Elliot West
>              Labels: metastore
>         Attachments: HIVE-12274.example.ddl.hql
>
>
> h2. Overview
> This issue is very similar in principle to HIVE-1364. We are hitting a limit when processing
JSON data that has a large nested schema. The struct definition is truncated when inserted
into the metastore database column {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters
in length.
> Given that the purpose of these columns is to hold very loosely defined configuration
values it seems rather limiting to impose such a relatively low length bound. One can imagine
that valid use cases will arise where reasonable parameter/property values exceed the current
limit. 
> h2. Context
> These limitations were in by the [patch attributed|https://github.com/apache/hive/commit/c21a526b0a752df2a51d20a2729cc8493c228799]
to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the reason. However,
nowadays the limit can be increased because:
> * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the configuration parameter
{{MAX_STRING_SIZE}} to {{EXTENDED}}. (source)
> * Postgres supports a max of 1GB for {{character}} datatype. (source)
> * MySQL can support upto 65535 bytes for the entire row. So long as the {{PARAM_KEY}}
value + {{PARAM_VALUE}} is less than 65535, we should be good. (source)
> * SQL Server's {{varchar}} max length is 8000 and can go beyond using "varchar(max)"
with the same limitation as MySQL being 65535 bytes for the entire row. (source)
> * Derby's {{varchar}} can be upto 32672 bytes. (source)
> h2. Proposal
> Can these columns not use CLOB-like types as for example as used by {{TBLS.VIEW_EXPANDED_TEXT}}?
It would seem that suitable type equivalents exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}
> * Derby: {{LONG VARCHAR}}
> I'd suggest that the candidates for type change are:
> * {{COLUMNS_V2.TYPE_NAME}}
> * {{TABLE_PARAMS.PARAM_VALUE}}
> * {{SERDE_PARAMS.PARAM_VALUE}}
> * {{SD_PARAMS.PARAM_VALUE}}
> Finally, will this limitation persist in the work resulting from HIVE-9452?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message