hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elliot West <tea...@gmail.com>
Subject Re: hive2.1.0 one partition has two locations
Date Thu, 22 Dec 2016 09:05:07 GMT
I believe there is an issue with non-string type partition values. On some
code path point they are incorrectly compared as strings when a numeric
comparison should be used instead. Consequently, as '04' ≠ '4' you get two
different partitions. To work around this you should ensure that only one
numerical partition key format is used: always strip leading zeros. I've
had a look in the Hive JIRA and can find no related issues that haven't
been fixed for Hive 2.1.0, so perhaps this is a new find. I'd suggest
raising a new issue: https://issues.apache.org/jira/browse/HIVE

Thanks,

Elliot.

On Thu, 22 Dec 2016 at 07:49, 徐 鹏 <xupeng1226@outlook.com> wrote:

>
>
>
>
>
>
>
>
>
>
> Hi all:
>
>
>
>
>
>
>
>
>
>
>
>
> HQL:
>
>
> Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR=2016,MONTH=12,DAY=04)
>
> LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';
>
>
>
> INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR=2016,MONTH=12,DAY=04)
>
> SELECT
>
>     '2016-12-04',
>
>     EventID,
>
>     PlatID,
>
>     ProvinceID,
>
>     CityID,
>
>     RefID,
>
>     '',
>
>     '',
>
>     '',
>
>     COUNT(1),
>
>     COUNT(DISTINCT DeviceID)
>
> FROM TCRecSys_ApplyData.OperatingStat_D_EventList
>
> WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
>
> GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;
>
>
>
> expected result:
>
>
>
>
>
>
>
>    - partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/
>    TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04’
>
>
>
> actual result:
>
>
>
>
>
>
>
>    - partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/
>    TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04'
>    - partition(YEAR=2016,MONTH=12,DAY=4) LOCATION '/data/ApplicationDep/
>    TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4'
>
>
>
> desc format info:
>
>
>
>
>
>
> hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=04);
>
> OK
>
> # col_name              data_type               comment
>
>
>
> createdate              string
>
> eventid                 string
>
> platid                  string
>
> provinceid              string
>
> cityid                  string
>
> refid                   string
>
> def1                    string
>
> def2                    string
>
> def3                    string
>
> pv                      int
>
> uv                      int
>
>
>
> # Partition Information
>
> # col_name              data_type               comment
>
>
>
> year                    int
>
> month                   int
>
> day                     int
>
>
>
> # Detailed Partition Information
>
> Partition Value:        [2016, 12, 04]
>
> Database:               tcrecsys_applydata
>
> Table:                  operatingstat_r_view
>
> CreateTime:             Mon Dec 05 10:46:27 CST 2016
>
> LastAccessTime:         UNKNOWN
>
> Protect Mode:           None
>
> Location:               hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04
>
> Partition Parameters:
>
>         COLUMN_STATS_ACCURATE   false
>
>         numFiles                0
>
>         numRows                 -1
>
>         rawDataSize             -1
>
>         totalSize               0
>
>         transient_lastDdlTime   1480905987
>
>
>
> # Storage Information
>
> SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
> InputFormat:            org.apache.hadoop.mapred.TextInputFormat
>
> OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
> Compressed:             No
>
> Num Buckets:            -1
>
> Bucket Columns:         []
>
> Sort Columns:           []
>
> Storage Desc Params:
>
>         field.delim             ^
>
>         serialization.format    ^
>
> Time taken: 0.485 seconds, Fetched: 48 row(s)
>
> hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=4);
>
> OK
>
> # col_name              data_type               comment
>
>
>
> createdate              string
>
> eventid                 string
>
> platid                  string
>
> provinceid              string
>
> cityid                  string
>
> refid                   string
>
> def1                    string
>
> def2                    string
>
> def3                    string
>
> pv                      int
>
> uv                      int
>
>
>
> # Partition Information
>
> # col_name              data_type               comment
>
>
>
> year                    int
>
> month                   int
>
> day                     int
>
>
>
> # Detailed Partition Information
>
> Partition Value:        [2016, 12, 4]
>
> Database:               tcrecsys_applydata
>
> Table:                  operatingstat_r_view
>
> CreateTime:             Mon Dec 05 10:46:07 CST 2016
>
> LastAccessTime:         UNKNOWN
>
> Protect Mode:           None
>
> Location:               hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4
>
> Partition Parameters:
>
>         COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>
>         numFiles                59
>
>         numRows                 0
>
>         rawDataSize             0
>
>         totalSize               49869
>
>         transient_lastDdlTime   1480906019
>
>
>
> # Storage Information
>
> SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
> InputFormat:            org.apache.hadoop.mapred.TextInputFormat
>
> OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
> Compressed:             No
>
> Num Buckets:            -1
>
> Bucket Columns:         []
>
> Sort Columns:           []
>
> Storage Desc Params:
>
>         field.delim             ^
>
>         serialization.format    ^
>
> Time taken: 0.087 seconds, Fetched: 48 row(s)
>
>
>
> drop the partition, both locations dropped:
>
>
>
>
>
>
> hive> alter table TCRecSys_ApplyData.OperatingStat_R_View drop partition (year=2016,month=12,day=4);
>
> Dropped the partition year=2016/month=12/day=04
>
> Dropped the partition year=2016/month=12/day=4
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> now :
>
>
>
>
> Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR='2016',MONTH='12',DAY='04')
>
> LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';
>
>
>
> INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR='2016',MONTH='12',DAY='04')
>
> SELECT
>
>     '2016-12-04',
>
>     EventID,
>
>     PlatID,
>
>     ProvinceID,
>
>     CityID,
>
>     RefID,
>
>     '',
>
>     '',
>
>     '',
>
>     COUNT(1),
>
>     COUNT(DISTINCT DeviceID)
>
> FROM TCRecSys_ApplyData.OperatingStat_D_EventList
>
> WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
>
> GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> What’s problem?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Best Regards
>
>
>
>
>
>
>
>
>

Mime
View raw message