hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Margus Roo <mar...@roo.ee>
Subject Re: Create external table
Date Wed, 11 May 2016 05:16:04 GMT
Hi

Thanks for your answer.

---

At first I create an empty hdfs directory (if directory is empty I did 
not have problems before too).

[margusja@hadoopnn1 ~]$ hdfs dfs -mkdir /user/margusja/trips

[margusja@hadoopnn1 ~]$ beeline -f create_externat_table_trips.hql -u 
"jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_HOST@TESTHADOOP.COM"

--verbose
WARNING: Use "yarn jar" to launch YARN applications.
issuing: !connect 
jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_HOST@TESTHADOOP.COM

'' [passwd stripped]
Connecting to 
jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_HOST@TESTHADOOP.COM
Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> use default;
Getting log thread is interrupted, since query is done!
No rows affected (1.225 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> drop table if exists trips;
Getting log thread is interrupted, since query is done!
No rows affected (0.159 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> CREATE EXTERNAL TABLE 
`TRIPS`(
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `bike_nr` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `duration` int,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `start_date` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `start_station` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `end_station` string)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> PARTITIONED BY (
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `year` int,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `month` string)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> ROW FORMAT DELIMITED
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   FIELDS TERMINATED BY ','
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   LINES TERMINATED BY '\n'
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> STORED AS INPUTFORMAT
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> 
'org.apache.hadoop.mapred.TextInputFormat'
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> OUTPUTFORMAT
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> LOCATION
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> '/user/margusja/trips';
Getting log thread is interrupted, since query is done!
No rows affected (0.067 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> desc trips;
Getting log thread is interrupted, since query is done!
+--------------------------+-----------------------+-----------------------+--+
|         col_name         |       data_type       | comment        |
+--------------------------+-----------------------+-----------------------+--+
| bike_nr                  | string |                       |
| duration                 | int |                       |
| start_date               | string |                       |
| start_station            | string |                       |
| end_station              | string |                       |
| year                     | int |                       |
| month                    | string |                       |
|                          | NULL                  | NULL                  |
| # Partition Information  | NULL                  | NULL                  |
| # col_name               | data_type             | comment               |
|                          | NULL                  | NULL                  |
| year                     | int |                       |
| month                    | string |                       |
+--------------------------+-----------------------+-----------------------+--+
13 rows selected (0.46 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> !exit
Closing: 0: 
jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.host.ee:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_HOST@TESTHADOOP.COM


But In case I try:

  CREATE EXTERNAL TABLE default.st1_test_margusja (
   original STRING,
   rsyslog_timestamp STRING,
   rsyslog_url STRING,
   rsyslog_appname STRING,
   rsyslog_pos5 STRING,
   ts STRING,
   url STRING,
   username STRING,
     ip_address STRING,
     log_level STRING,
     content STRING
   )
   COMMENT 'Dealgate logs raw data'
   PARTITIONED BY (year STRING, month STRING, day STRING)
   ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
   WITH SERDEPROPERTIES ('input.regex'='([^\\s]+ ([^\\s]+) ([^\\s]+) 
([^\\s]+) ([^\\s]+) [^\\s]+ 
[^\\s]+[\\s]{1,2}([^\\|]+)\\|([^\\|]+)\\|([^\\+]*)\\+([^\\|]*)\\|([^\\|]+)\\|(.+))' 
)
   STORED AS TEXTFILE
   LOCATION '/datasource/test/';

Now I am starting to see loads of GC warnings in hiveserver2 log and 
sometomes beeline ends: Error: 
org.apache.thrift.transport.TTransportException (state=08S01,code=0)

[hdfs@hadoopnn1 ~]$ hdfs dfs -count -h /datasource/test/
           53        7.2 K              8.4 G /datasource/test

And now I use hive:

[margusja@hadoopnn1 ~]$ hive
WARNING: Use "yarn jar" to launch YARN applications.
log4j:WARN No such property [maxBackupIndex] in 
org.apache.log4j.DailyRollingFileAppender.

Logging initialized using configuration in 
file:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties
hive>  CREATE EXTERNAL TABLE default.st1_test_margusja (
     >   original STRING,
     >   rsyslog_timestamp STRING,
     >   rsyslog_url STRING,
     >   rsyslog_appname STRING,
     >   rsyslog_pos5 STRING,
     >   ts STRING,
     >   url STRING,
     >   username STRING,
     >     ip_address STRING,
     >     log_level STRING,
     >     content STRING
     >   )
     >   COMMENT 'Dealgate logs raw data'
     >   PARTITIONED BY (year STRING, month STRING, day STRING)
     >   ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
     >   WITH SERDEPROPERTIES ('input.regex'='([^\\s]+ ([^\\s]+) 
([^\\s]+) ([^\\s]+) ([^\\s]+) [^\\s]+ 
[^\\s]+[\\s]{1,2}([^\\|]+)\\|([^\\|]+)\\|([^\\+]*)\\+([^\\|]*)\\|([^\\|]+)\\|(.+))' 
)
     >   STORED AS TEXTFILE
     >   LOCATION '/datasource/test/';
OK
Time taken: 1.254 seconds
hive>


Any hint is welcome



Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780

On 11/05/16 02:16, Mich Talebzadeh wrote:
> *use default;
> drop table if exists trips;
> CREATE EXTERNAL TABLE `TRIPS`(
>   `bike_nr` string,
>   `duration` int,
>   `start_date` string,
>   `start_station` string,
>   `end_station` string)
> PARTITIONED BY (
>   `year` int,
>   `month` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY ','
>   LINES TERMINATED BY '\n'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   '/test/text/trips';*


Mime
View raw message