hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furcy Pin <furcy....@flaminem.com>
Subject Re: PARTITION error because different columns size
Date Tue, 13 Dec 2016 14:41:42 GMT
Actually, there is a legacy feature in Hive that would do exactly what you
need :

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification


Another way would simply to use dynamic partitioning :
INSERT INTO TABLE employee_orc PARTITION *(country, office)* select * from
employee where country='USA' and office='HQ-TX';

The only downside of this method (that I know of), is that Hive will put a
lock on every partitions, even if only one is inserted.


I also know a third solution, but I can't talk about it yet.



On Tue, Dec 13, 2016 at 3:27 PM, Joaquin Alzola <Joaquin.Alzola@lebara.com>
wrote:

> Hi Suresh
>
>
>
> I choose the * and not the specific fields because I have 520 columns.
>
> The data that I tested was only a testing ground.
>
>
>
> I suppose then that I need to select the 520 fileds. L
>
>
>
>
>
>
>
> *From:* Suresh Kumar Sethuramaswamy [mailto:rockssk@gmail.com]
> *Sent:* 13 December 2016 14:19
> *To:* user@hive.apache.org
> *Subject:* Re: PARTITION error because different columns size
>
>
>
> Hi Joaquin
>
>
>
>     In hive , when u run 'select * from employee' it is going to return
> the partitioned columns also at the end,  whereas you don't want that to be
> inserted into ur ORC table , so ur insert query should look like
>
>
>
>       INSERT INTO TABLE employee_orc PARTITION (country='USA',
> office='HQ-TX') select eid,salary from employee where country='USA' and
> office='HQ-TX';
>
>
>
>
>
>  Remember partition in hive is a physical folder name
>
>
>
> Regards
>
> Suresh
>
>
>
>
>
>
>
> On Tue, Dec 13, 2016 at 6:37 AM Joaquin Alzola <Joaquin.Alzola@lebara.com>
> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Hi List
>
>
>
>
>
>
>
> I change Spark to 2.0.2 and Hive 2.0.1.
>
>
>
> I have the bellow tables but the INSERT INTO TABLE employee_orc PARTITION
> (country='USA', office='HQ-TX') select * from employee where country='USA'
> and office='HQ-TX';
>
>
>
> Is giving me à Cannot insert into table `default`.`employee_orc` because
> the number of columns are different: need 4 columns, but query has 6
> columns.;
>
>
>
>
>
>
>
> When doing select it is adding the Partition as columns ….
>
>
>
>
>
>
>
> CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
>
>
>
> salary String, destination String)
>
>
>
> COMMENT 'Employee details'
>
>
>
> PARTITIONED BY(country string, office string)
>
>
>
> ROW FORMAT DELIMITED
>
>
>
> FIELDS TERMINATED BY '\t'
>
>
>
> LINES TERMINATED BY '\n'
>
>
>
> STORED AS TEXTFILE;
>
>
>
>
>
>
>
> CREATE TABLE IF NOT EXISTS employee_orc ( eid int, name String,
>
>
>
> salary String, destination String)
>
>
>
> COMMENT 'Employee details'
>
>
>
> PARTITIONED BY(country string, office string)
>
>
>
> STORED AS ORC tblproperties ("orc.compress"="ZLIB");
>
>
>
>
>
>
>
> 0: jdbc:hive2://localhost:10000> LOAD DATA LOCAL INPATH
> '/mnt/sample.txt.gz' INTO TABLE employee PARTITION (country='USA',
> office='HQ-TX');
>
>
>
> +---------+--+
>
>
>
> | Result  |
>
>
>
> +---------+--+
>
>
>
> +---------+--+
>
>
>
> No rows selected (0.685 seconds)
>
>
>
> 0: jdbc:hive2://localhost:10000> select * from employee;
>
>
>
> +-------+--------------+---------+--------------------+-----
> -----+---------+--+
>
>
>
> |  eid  |     name     | salary  |    destination     | country  | office
> |
>
>
>
> +-------+--------------+---------+--------------------+-----
> -----+---------+--+
>
>
>
> | 1201  | Gopal        | 45000   | Technical manager  | USA      | HQ-TX
> |
>
>
>
> | 1202  | Manisha      | 45000   | Proof reader       | USA      | HQ-TX
> |
>
>
>
> | 1203  | Masthanvali  | 40000   | Technical writer   | USA      | HQ-TX
> |
>
>
>
> | 1204  | Kiran        | 40000   | Hr Admin           | USA      | HQ-TX
> |
>
>
>
> | 1205  | Kranthi      | 30000   | Op Admin           | USA      | HQ-TX
> |
>
>
>
> +-------+--------------+---------+--------------------+-----
> -----+---------+--+
>
>
>
> 5 rows selected (0.358 seconds)
>
>
>
> 0: jdbc:hive2://localhost:10000> INSERT INTO TABLE employee_orc PARTITION
> (country='USA', office='HQ-TX') select * from employee where country='USA'
> and office='HQ-TX';
>
>
>
> Error: org.apache.spark.sql.AnalysisException: Cannot insert into table
> `default`.`employee_orc` because the number of columns are different: need
> 4 columns, but query has 6 columns.; (state=,code=0)
>
>
>
>
>
>
>
>
>
>
>
> 0: jdbc:hive2://localhost:10000> describe employee_orc;
>
>
>
> +--------------------------+------------+----------+--+
>
>
>
> |         col_name         | data_type  | comment  |
>
>
>
> +--------------------------+------------+----------+--+
>
>
>
> | eid                      | int        | NULL     |
>
>
>
> | name                     | string     | NULL     |
>
>
>
> | salary                   | string     | NULL     |
>
>
>
> | destination              | string     | NULL     |
>
>
>
> | country                  | string     | NULL     |
>
>
>
> | office                   | string     | NULL     |
>
>
>
> | # Partition Information  |            |          |
>
>
>
> | # col_name               | data_type  | comment  |
>
>
>
> | country                  | string     | NULL     |
>
>
>
> | office                   | string     | NULL     |
>
>
>
> +--------------------------+------------+----------+--+
>
>
>
>
>
>
>
> 0: jdbc:hive2://localhost:10000>  describe employee;
>
>
>
> +--------------------------+------------+----------+--+
>
>
>
> |         col_name         | data_type  | comment  |
>
>
>
> +--------------------------+------------+----------+--+
>
>
>
> | eid                      | int        | NULL     |
>
>
>
> | name                     | string     | NULL     |
>
>
>
> | salary                   | string     | NULL     |
>
>
>
> | destination              | string     | NULL     |
>
>
>
> | country                  | string     | NULL     |
>
>
>
> | office                   | string     | NULL     |
>
>
>
> | # Partition Information  |            |          |
>
>
>
> | # col_name               | data_type  | comment  |
>
>
>
> | country                  | string     | NULL     |
>
>
>
> | office                   | string     | NULL     |
>
>
>
> +--------------------------+------------+----------+--+
>
>
>
> 10 rows selected (0.045 seconds)
>
>
>
>
>
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>
>
>
>
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>

Mime
View raw message