spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "james.green9@baesystems.com" <james.gre...@baesystems.com>
Subject RE: spark hivethriftserver problem on 1.5.0 -> 1.6.0 upgrade
Date Wed, 27 Jan 2016 15:21:32 GMT

Thanks Yin,  here are the logs:



INFO  SparkContext - Added JAR file:/home/jegreen1/mms/zookeeper-3.4.6.jar at http://10.39.65.122:38933/jars/zookeeper-3.4.6.jar
with timestamp 1453907484092
INFO  SparkContext - Added JAR file:/home/jegreen1/mms/mms-http-0.2-SNAPSHOT.jar at http://10.39.65.122:38933/jars/mms-http-0.2-SNAPSHOT.jar
with timestamp 1453907484093
INFO  Executor - Starting executor ID driver on host localhost
INFO  Utils - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService'
on port 41220.
INFO  NettyBlockTransferService - Server created on 41220
INFO  BlockManagerMaster - Trying to register BlockManager
INFO  BlockManagerMasterEndpoint - Registering block manager localhost:41220 with 511.1 MB
RAM, BlockManagerId(driver, localhost, 41220)
INFO  BlockManagerMaster - Registered BlockManager
INFO  HiveContext - Initializing execution hive, version 1.2.1
INFO  ClientWrapper - Inspected Hadoop version: 2.6.0
INFO  ClientWrapper - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version
2.6.0
WARN  HiveConf - HiveConf of name hive.enable.spark.execution.engine does not exist
INFO  HiveMetaStore - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
INFO  ObjectStore - ObjectStore, initialize called
INFO  Persistence - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
INFO  Persistence - Property datanucleus.cache.level2 unknown - will be ignored
WARN  HiveConf - HiveConf of name hive.enable.spark.execution.engine does not exist
INFO  ObjectStore - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
INFO  Datastore - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged
as "embedded-only" so does not have its own datastore table.
INFO  Datastore - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only"
so does not have its own datastore table.
INFO  Datastore - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged
as "embedded-only" so does not have its own datastore table.
INFO  Datastore - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only"
so does not have its own datastore table.
INFO  MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
INFO  ObjectStore - Initialized ObjectStore
WARN  ObjectStore - Version information not found in metastore. hive.metastore.schema.verification
is not enabled so recording the schema version 1.2.0
WARN  ObjectStore - Failed to get database default, returning NoSuchObjectException
INFO  HiveMetaStore - Added admin role in metastore
INFO  HiveMetaStore - Added public role in metastore
INFO  HiveMetaStore - No user is added in admin role, since config is empty
INFO  HiveMetaStore - 0: get_all_databases
INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=get_all_databases
INFO  HiveMetaStore - 0: get_functions: db=default pat=*
INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=get_functions: db=default pat=*
INFO  Datastore - The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged
as "embedded-only" so does not have its own datastore table.
WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java
classes where applicable
INFO  SessionState - Created local directory: /tmp/9b102c97-c3f4-4d92-b722-0a2e257d3b5b_resources
INFO  SessionState - Created HDFS directory: /tmp/hive/jegreen1/9b102c97-c3f4-4d92-b722-0a2e257d3b5b
INFO  SessionState - Created local directory: /tmp/jegreen1/9b102c97-c3f4-4d92-b722-0a2e257d3b5b
INFO  SessionState - Created HDFS directory: /tmp/hive/jegreen1/9b102c97-c3f4-4d92-b722-0a2e257d3b5b/_tmp_space.db
WARN  HiveConf - HiveConf of name hive.enable.spark.execution.engine does not exist
INFO  HiveContext - default warehouse location is /user/hive/warehouse
INFO  HiveContext - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
INFO  ClientWrapper - Inspected Hadoop version: 2.6.0
INFO  ClientWrapper - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version
2.6.0
WARN  HiveConf - HiveConf of name hive.enable.spark.execution.engine does not exist
INFO  metastore - Trying to connect to metastore with URI thrift://dkclusterm2.imp.net:9083
INFO  metastore - Connected to metastore.
INFO  SessionState - Created local directory: /tmp/7e230580-37af-47d3-81cc-eb4829b8da62_resources
INFO  SessionState - Created HDFS directory: /tmp/hive/jegreen1/7e230580-37af-47d3-81cc-eb4829b8da62
INFO  SessionState - Created local directory: /tmp/jegreen1/7e230580-37af-47d3-81cc-eb4829b8da62
INFO  SessionState - Created HDFS directory: /tmp/hive/jegreen1/7e230580-37af-47d3-81cc-eb4829b8da62/_tmp_space.db
INFO  ParquetRelation - Listing hdfs://dkclusterm1.imp.net:8020/user/jegreen1/ex208 on driver
INFO  SparkContext - Starting job: parquet at ThriftTest.scala:39
INFO  DAGScheduler - Got job 0 (parquet at ThriftTest.scala:39) with 32 output partitions
INFO  DAGScheduler - Final stage: ResultStage 0 (parquet at ThriftTest.scala:39)
INFO  DAGScheduler - Parents of final stage: List()
INFO  DAGScheduler - Missing parents: List()
INFO  DAGScheduler - Submitting ResultStage 0 (MapPartitionsRDD[1] at parquet at ThriftTest.scala:39),
which has no missing parents
INFO  MemoryStore - Block broadcast_0 stored as values in memory (estimated size 65.5 KB,
free 65.5 KB)
INFO  MemoryStore - Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9
KB, free 88.3 KB)
INFO  BlockManagerInfo - Added broadcast_0_piece0 in memory on localhost:41220 (size: 22.9
KB, free: 511.1 MB)
INFO  SparkContext - Created broadcast 0 from broadcast at DAGScheduler.scala:1006
INFO  DAGScheduler - Submitting 32 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at
parquet at ThriftTest.scala:39)
INFO  TaskSchedulerImpl - Adding task set 0.0 with 32 tasks
INFO  TaskSetManager - Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL,
6528 bytes)
INFO  TaskSetManager - Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,PROCESS_LOCAL,
6528 bytes)
INFO  TaskSetManager - Starting task 2.0 in stage 0.0 (TID 2, localhost, partition 2,PROCESS_LOCAL,
6528 bytes)
INFO  TaskSetManager - Starting task 3.0 in stage 0.0 (TID 3, localhost, partition 3,PROCESS_LOCAL,
6528 bytes)
INFO  TaskSetManager - Starting task 4.0 in stage 0.0 (TID 4, localhost, partition 4,PROCESS_LOCAL,
6528 bytes)
INFO  TaskSetManager - Starting task 5.0 in stage 0.0 (TID 5, localhost, partition 5,PROCESS_LOCAL,
6528 bytes)


From: Yin Huai [mailto:yhuai@databricks.com]
Sent: 26 January 2016 17:48
To: Green, James (UK Guildford)
Cc: dev@spark.apache.org
Subject: Re: spark hivethriftserver problem on 1.5.0 -> 1.6.0 upgrade

Can you post more logs, specially lines around "Initializing execution hive ..." (this is
for an internal used fake metastore and it is derby) and "Initializing HiveMetastoreConnection
version ..." (this is for the real metastore. It should be your remote one)? Also, those temp
tables are stored in the memory and are associated with a HiveContext. If you can not see
temp tables, it usually means that the HiveContext that you used with JDBC was different from
the one used to create the temp table. However, in your case, you are using HiveThriftServer2.startWithContext(hiveContext).
So, it will be good to provide more logs and see what happened.

Thanks,

Yin

On Tue, Jan 26, 2016 at 1:33 AM, james.green9@baesystems.com<mailto:james.green9@baesystems.com>
<james.green9@baesystems.com<mailto:james.green9@baesystems.com>> wrote:
Hi

I posted this on the user list yesterday,  I am posting it here now because on further investigation
I am pretty sure this is a bug:


On upgrade from 1.5.0 to 1.6.0 I have a problem with the hivethriftserver2, I have this code:

val hiveContext = new HiveContext(SparkContext.getOrCreate(conf));

val thing = hiveContext.read.parquet("hdfs://dkclusterm1.imp.net:8020/user/jegreen1/ex208<http://dkclusterm1.imp.net:8020/user/jegreen1/ex208>")

thing.registerTempTable("thing")

HiveThriftServer2.startWithContext(hiveContext)


When I start things up on the cluster my hive-site.xml is found – I can see that the metastore
connects:


INFO  metastore - Trying to connect to metastore with URI thrift://dkclusterm2.imp.net:9083<http://dkclusterm2.imp.net:9083>
INFO  metastore - Connected to metastore.


But then later on the thrift server seems not to connect to the remote hive metastore but
to start a derby instance instead:

INFO  AbstractService - Service:CLIService is started.
INFO  ObjectStore - ObjectStore, initialize called
INFO  Query - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0<mailto:org.datanucleus.store.rdbms.query.SQLQuery@0>"
since the connection used is closing
INFO  MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
INFO  ObjectStore - Initialized ObjectStore
INFO  HiveMetaStore - 0: get_databases: default
INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=get_databases: default
INFO  HiveMetaStore - 0: Shutting down the object store...
INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=Shutting down the object store...
INFO  HiveMetaStore - 0: Metastore shutdown complete.
INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=Metastore shutdown complete.
INFO  AbstractService - Service:ThriftBinaryCLIService is started.
INFO  AbstractService - Service:HiveServer2 is started.

On 1.5.0 the same bit of the log reads:

INFO  AbstractService - Service:CLIService is started.
INFO  metastore - Trying to connect to metastore with URI thrift://dkclusterm2.imp.net:9083<http://dkclusterm2.imp.net:9083>
     ******* ie 1.5.0 connects to remote hive
INFO  metastore - Connected to metastore.
INFO  AbstractService - Service:ThriftBinaryCLIService is started.
INFO  AbstractService - Service:HiveServer2 is started.
INFO  ThriftCLIService - Starting ThriftBinaryCLIService on port 10000 with 5...500 worker
threads



So if I connect to this with JDBC I can see all the tables on the hive server – but not
anything temporary – I guess they are going to derby.

I see someone on the databricks website is also having this problem.


Thanks

James
Please consider the environment before printing this email. This message should be regarded
as confidential. If you have received this email in error please notify the sender and destroy
it immediately. Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory. The contents of this email may relate to dealings with other companies
under the control of BAE Systems Applied Intelligence Limited, details of which can be found
at http://www.baesystems.com/Businesses/index.htm.

Please consider the environment before printing this email. This message should be regarded
as confidential. If you have received this email in error please notify the sender and destroy
it immediately. Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory. The contents of this email may relate to dealings with other companies
under the control of BAE Systems Applied Intelligence Limited, details of which can be found
at http://www.baesystems.com/Businesses/index.htm.
Mime
View raw message