spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheng Lian (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-8435) Cannot create tables in an specific database using a provider
Date Mon, 27 Jul 2015 09:21:06 GMT

     [ https://issues.apache.org/jira/browse/SPARK-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Cheng Lian reassigned SPARK-8435:
---------------------------------

    Assignee: Cheng Lian

> Cannot create tables in an specific database using a provider
> -------------------------------------------------------------
>
>                 Key: SPARK-8435
>                 URL: https://issues.apache.org/jira/browse/SPARK-8435
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>         Environment: Spark SQL 1.4.0 (Spark-Shell), Hive metastore, MySQL Driver, Linux
>            Reporter: Cristian
>            Assignee: Cheng Lian
>             Fix For: 1.5.0
>
>
> Hello,
> I've been trying to create tables in different catalogs using a Hive metastore and when
I execute the "CREATE" statement, I realized that it is created into the default catalog.
> This is what I'm trying. 
> {quote}
> scala> sqlContext.sql("CREATE DATABASE IF NOT EXISTS testmetastore COMMENT 'Testing
catalogs' ")
> scala> sqlContext.sql("USE testmetastore")
> scala> sqlContext.sql("CREATE TABLE students USING org.apache.spark.sql.parquet OPTIONS
(path '/user/hive, highavailability 'true', DefaultLimit '1000')")
> {quote}
> And this is what I get. I can see that it is kind of working because it seems that when
it checks if the table exists, it searchs in the correct catalog (testmetastore). But finally
when it tries to create the table, it uses the default catalog.
> {quote}
> scala> sqlContext.sql("CREATE TABLE students USING a OPTIONS (highavailability 'true',
DefaultLimit '1000')").show
> 15/06/18 10:28:48 INFO HiveMetaStore: 0: get_table : db=*testmetastore* tbl=students
> 15/06/18 10:28:48 INFO audit: ugi=ccaballero	ip=unknown-ip-addr	cmd=get_table : db=testmetastore
tbl=students	
> 15/06/18 10:28:48 INFO Persistence: Request to load fields "comment,name,type" of class
org.apache.hadoop.hive.metastore.model.MFieldSchema but object is embedded, so ignored
> 15/06/18 10:28:48 INFO Persistence: Request to load fields "comment,name,type" of class
org.apache.hadoop.hive.metastore.model.MFieldSchema but object is embedded, so ignored
> 15/06/18 10:28:48 INFO HiveMetaStore: 0: create_table: Table(tableName:students, dbName:*default*,
owner:ccaballero, createTime:1434616128, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col,
type:array<string>, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false,
numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,
parameters:{DefaultLimit=1000, serialization.format=1, highavailability=true}), bucketCols:[],
sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})),
partitionKeys:[], parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=a}, viewOriginalText:null,
viewExpandedText:null, tableType:MANAGED_TABLE)
> 15/06/18 10:28:48 INFO audit: ugi=ccaballero	ip=unknown-ip-addr	cmd=create_table: Table(tableName:students,
dbName:default, owner:ccaballero, createTime:1434616128, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col,
type:array<string>, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false,
numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,
parameters:{DefaultLimit=1000, serialization.format=1, highavailability=true}), bucketCols:[],
sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})),
partitionKeys:[], parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=a}, viewOriginalText:null,
viewExpandedText:null, tableType:MANAGED_TABLE)	
> 15/06/18 10:28:49 INFO SparkContext: Starting job: show at <console>:20
> 15/06/18 10:28:49 INFO DAGScheduler: Got job 2 (show at <console>:20) with 1 output
partitions (allowLocal=false)
> 15/06/18 10:28:49 INFO DAGScheduler: Final stage: ResultStage 2(show at <console>:20)
> 15/06/18 10:28:49 INFO DAGScheduler: Parents of final stage: List()
> 15/06/18 10:28:49 INFO DAGScheduler: Missing parents: List()
> 15/06/18 10:28:49 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[6] at
show at <console>:20), which has no missing parents
> 15/06/18 10:28:49 INFO MemoryStore: ensureFreeSpace(1792) called with curMem=0, maxMem=278302556
> 15/06/18 10:28:49 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated
size 1792.0 B, free 265.4 MB)
> 15/06/18 10:28:49 INFO MemoryStore: ensureFreeSpace(1139) called with curMem=1792, maxMem=278302556
> 15/06/18 10:28:49 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory
(estimated size 1139.0 B, free 265.4 MB)
> 15/06/18 10:28:49 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:59110
(size: 1139.0 B, free: 265.4 MB)
> 15/06/18 10:28:49 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
> 15/06/18 10:28:49 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[6]
at show at <console>:20)
> 15/06/18 10:28:49 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
> 15/06/18 10:28:49 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost,
PROCESS_LOCAL, 1379 bytes)
> 15/06/18 10:28:49 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
> 15/06/18 10:28:49 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 628 bytes result
sent to driver
> 15/06/18 10:28:49 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 10 ms
on localhost (1/1)
> 15/06/18 10:28:49 INFO DAGScheduler: ResultStage 2 (show at <console>:20) finished
in 0.010 s
> 15/06/18 10:28:49 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed,
from pool 
> 15/06/18 10:28:49 INFO DAGScheduler: Job 2 finished: show at <console>:20, took
0.016204 s
> ++
> ||
> ++
> ++
> {quote}
> Any suggestions would be appreciated.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message