hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 徐 鹏 <xupeng1...@outlook.com>
Subject hive2.1.0 one partition has two locations
Date Thu, 22 Dec 2016 07:48:52 GMT
Hi all:


HQL:

Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR=2016,MONTH=12,DAY=04)
LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';

INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR=2016,MONTH=12,DAY=04)
SELECT
    '2016-12-04',
    EventID,
    PlatID,
    ProvinceID,
    CityID,
    RefID,
    '',
    '',
    '',
    COUNT(1),
    COUNT(DISTINCT DeviceID)
FROM TCRecSys_ApplyData.OperatingStat_D_EventList
WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;

expected result:

  *   partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04’

actual result:

  *   partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04'
  *   partition(YEAR=2016,MONTH=12,DAY=4) LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4'

desc format info:

hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=04);
OK
# col_name              data_type               comment

createdate              string
eventid                 string
platid                  string
provinceid              string
cityid                  string
refid                   string
def1                    string
def2                    string
def3                    string
pv                      int
uv                      int

# Partition Information
# col_name              data_type               comment

year                    int
month                   int
day                     int

# Detailed Partition Information
Partition Value:        [2016, 12, 04]
Database:               tcrecsys_applydata
Table:                  operatingstat_r_view
CreateTime:             Mon Dec 05 10:46:27 CST 2016
LastAccessTime:         UNKNOWN
Protect Mode:           None
Location:               hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04
Partition Parameters:
        COLUMN_STATS_ACCURATE   false
        numFiles                0
        numRows                 -1
        rawDataSize             -1
        totalSize               0
        transient_lastDdlTime   1480905987

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        field.delim             ^
        serialization.format    ^
Time taken: 0.485 seconds, Fetched: 48 row(s)
hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=4);
OK
# col_name              data_type               comment

createdate              string
eventid                 string
platid                  string
provinceid              string
cityid                  string
refid                   string
def1                    string
def2                    string
def3                    string
pv                      int
uv                      int

# Partition Information
# col_name              data_type               comment

year                    int
month                   int
day                     int

# Detailed Partition Information
Partition Value:        [2016, 12, 4]
Database:               tcrecsys_applydata
Table:                  operatingstat_r_view
CreateTime:             Mon Dec 05 10:46:07 CST 2016
LastAccessTime:         UNKNOWN
Protect Mode:           None
Location:               hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4
Partition Parameters:
        COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
        numFiles                59
        numRows                 0
        rawDataSize             0
        totalSize               49869
        transient_lastDdlTime   1480906019

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        field.delim             ^
        serialization.format    ^
Time taken: 0.087 seconds, Fetched: 48 row(s)

drop the partition, both locations dropped:

hive> alter table TCRecSys_ApplyData.OperatingStat_R_View drop partition (year=2016,month=12,day=4);
Dropped the partition year=2016/month=12/day=04
Dropped the partition year=2016/month=12/day=4


now :

Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR='2016',MONTH='12',DAY='04')
LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';

INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR='2016',MONTH='12',DAY='04')
SELECT
    '2016-12-04',
    EventID,
    PlatID,
    ProvinceID,
    CityID,
    RefID,
    '',
    '',
    '',
    COUNT(1),
    COUNT(DISTINCT DeviceID)
FROM TCRecSys_ApplyData.OperatingStat_D_EventList
WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;


What’s problem?



Best Regards
Mime
View raw message