spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-15345) SparkSession's conf doesn't take effect when this already an existing SparkContext
Date Wed, 18 May 2016 02:21:13 GMT

     [ https://issues.apache.org/jira/browse/SPARK-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-15345:
------------------------------------

    Assignee: Apache Spark

> SparkSession's conf doesn't take effect when this already an existing SparkContext
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-15345
>                 URL: https://issues.apache.org/jira/browse/SPARK-15345
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.0.0
>            Reporter: Piotr Milanowski
>            Assignee: Apache Spark
>            Priority: Blocker
>
> I am working with branch-2.0, spark is compiled with hive support (-Phive and -Phvie-thriftserver).
> I am trying to access databases using this snippet:
> {code}
> from pyspark.sql import HiveContext
> hc = HiveContext(sc)
> hc.sql("show databases").collect()
> [Row(result='default')]
> {code}
> This means that spark doesn't find any databases specified in configuration.
> Using the same configuration (i.e. hive-site.xml and core-site.xml) in spark 1.6, and
launching above snippet, I can print out existing databases.
> When run in DEBUG mode this is what spark (2.0) prints out:
> {code}
> 16/05/16 12:17:47 INFO SparkSqlParser: Parsing command: show databases
> 16/05/16 12:17:47 DEBUG SimpleAnalyzer: 
> === Result of Batch Resolution ===
> !'Project [unresolveddeserializer(createexternalrow(if (isnull(input[0, string])) null
else input[0, string].toString, StructField(result,StringType,false)), result#2) AS #3]  
Project [createexternalrow(if (isnull(result#2)) null else result#2.toString, StructField(result,StringType,false))
AS #3]
>  +- LocalRelation [result#2]                                                        
                                                                                         
  +- LocalRelation [result#2]
>         
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure <function1> (org.apache.spark.sql.Dataset$$anonfun$53)
+++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public static final long org.apache.spark.sql.Dataset$$anonfun$53.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      private final org.apache.spark.sql.types.StructType
org.apache.spark.sql.Dataset$$anonfun$53.structType$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public final java.lang.Object org.apache.spark.sql.Dataset$$anonfun$53.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public final java.lang.Object org.apache.spark.sql.Dataset$$anonfun$53.apply(org.apache.spark.sql.catalyst.InternalRow)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because this is
the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure <function1> (org.apache.spark.sql.Dataset$$anonfun$53)
is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure <function1> (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1)
+++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public static final long org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public final java.lang.Object org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public final org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler
org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(scala.collection.Iterator)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because this is
the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure <function1> (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1)
is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure <function1> (org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13)
+++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public static final long org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      private final org.apache.spark.rdd.RDD$$anonfun$collect$1
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.$outer
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public final java.lang.Object org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public final java.lang.Object org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(scala.collection.Iterator)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      org.apache.spark.rdd.RDD$$anonfun$collect$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      org.apache.spark.rdd.RDD
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      <function0>
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      MapPartitionsRDD[5] at collect at <stdin>:1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because this is
the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting closure: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      (class org.apache.spark.rdd.RDD$$anonfun$collect$1,Set($outer))
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      (class org.apache.spark.rdd.RDD,Set(org$apache$spark$rdd$RDD$$evidence$1))
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outermost object is not a closure or REPL
line object, so do not clone it: (class org.apache.spark.rdd.RDD,MapPartitionsRDD[5] at collect
at <stdin>:1)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + cloning the object <function0> of class
org.apache.spark.rdd.RDD$$anonfun$collect$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + cleaning cloned closure <function0>
recursively (org.apache.spark.rdd.RDD$$anonfun$collect$1)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure <function0> (org.apache.spark.rdd.RDD$$anonfun$collect$1)
+++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public static final long org.apache.spark.rdd.RDD$$anonfun$collect$1.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      private final org.apache.spark.rdd.RDD org.apache.spark.rdd.RDD$$anonfun$collect$1.$outer
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public org.apache.spark.rdd.RDD org.apache.spark.rdd.RDD$$anonfun$collect$1.org$apache$spark$rdd$RDD$$anonfun$$$outer()
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public final java.lang.Object org.apache.spark.rdd.RDD$$anonfun$collect$1.apply()
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      org.apache.spark.rdd.RDD
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      MapPartitionsRDD[5] at collect at <stdin>:1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting closure: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      (class org.apache.spark.rdd.RDD$$anonfun$collect$1,Set($outer))
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      (class org.apache.spark.rdd.RDD,Set(org$apache$spark$rdd$RDD$$evidence$1))
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outermost object is not a closure or REPL
line object, so do not clone it: (class org.apache.spark.rdd.RDD,MapPartitionsRDD[5] at collect
at <stdin>:1)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure <function0> (org.apache.spark.rdd.RDD$$anonfun$collect$1)
is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure <function1> (org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13)
is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure <function2> (org.apache.spark.SparkContext$$anonfun$runJob$5)
+++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public static final long org.apache.spark.SparkContext$$anonfun$runJob$5.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      private final scala.Function1 org.apache.spark.SparkContext$$anonfun$runJob$5.cleanedFunc$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public final java.lang.Object org.apache.spark.SparkContext$$anonfun$runJob$5.apply(java.lang.Object,java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:      public final java.lang.Object org.apache.spark.SparkContext$$anonfun$runJob$5.apply(org.apache.spark.TaskContext,scala.collection.Iterator)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because this is
the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure <function2> (org.apache.spark.SparkContext$$anonfun$runJob$5)
is now cleaned +++
> 16/05/16 12:17:47 INFO SparkContext: Starting job: collect at <stdin>:1
> 16/05/16 12:17:47 INFO DAGScheduler: Got job 1 (collect at <stdin>:1) with 1 output
partitions
> 16/05/16 12:17:47 INFO DAGScheduler: Final stage: ResultStage 1 (collect at <stdin>:1)
> 16/05/16 12:17:47 INFO DAGScheduler: Parents of final stage: List()
> 16/05/16 12:17:47 INFO DAGScheduler: Missing parents: List()
> 16/05/16 12:17:47 DEBUG DAGScheduler: submitStage(ResultStage 1)
> 16/05/16 12:17:47 DEBUG DAGScheduler: missing: List()
> 16/05/16 12:17:47 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[5] at
collect at <stdin>:1), which has no missing parents
> 16/05/16 12:17:47 DEBUG DAGScheduler: submitMissingTasks(ResultStage 1)
> 16/05/16 12:17:47 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated
size 3.1 KB, free 5.8 GB)
> 16/05/16 12:17:47 DEBUG BlockManager: Put block broadcast_1 locally took  1 ms
> 16/05/16 12:17:47 DEBUG BlockManager: Putting block broadcast_1 without replication took
 1 ms
> 16/05/16 12:17:47 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory
(estimated size 1856.0 B, free 5.8 GB)
> 16/05/16 12:17:47 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 188.165.13.157:35738
(size: 1856.0 B, free: 5.8 GB)
> 16/05/16 12:17:47 DEBUG BlockManagerMaster: Updated info of block broadcast_1_piece0
> 16/05/16 12:17:47 DEBUG BlockManager: Told master about block broadcast_1_piece0
> 16/05/16 12:17:47 DEBUG BlockManager: Put block broadcast_1_piece0 locally took  1 ms
> 16/05/16 12:17:47 DEBUG BlockManager: Putting block broadcast_1_piece0 without replication
took  2 ms
> 16/05/16 12:17:47 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1012
> 16/05/16 12:17:47 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[5]
at collect at <stdin>:1)
> 16/05/16 12:17:47 DEBUG DAGScheduler: New pending partitions: Set(0)
> 16/05/16 12:17:47 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
> 16/05/16 12:17:47 DEBUG TaskSetManager: Epoch for TaskSet 1.0: 0
> 16/05/16 12:17:47 DEBUG TaskSetManager: Valid locality levels for TaskSet 1.0: NO_PREF,
ANY
> 16/05/16 12:17:47 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1, runningTasks:
0
> 16/05/16 12:17:47 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, xxx3, partition
0, PROCESS_LOCAL, 5542 bytes)
> 16/05/16 12:17:47 DEBUG TaskSetManager: No tasks for locality level NO_PREF, so moving
to locality level ANY
> 16/05/16 12:17:47 INFO SparkDeploySchedulerBackend: Launching task 1 on executor id:
0 hostname: xxx3.
> 16/05/16 12:17:48 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1, runningTasks:
1
> 16/05/16 12:17:48 DEBUG BlockManager: Getting local block broadcast_1_piece0 as bytes
> 16/05/16 12:17:48 DEBUG BlockManager: Level for block broadcast_1_piece0 is StorageLevel(disk=true,
memory=true, offheap=false, deserialized=false, replication=1)
> 16/05/16 12:17:48 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 188.165.13.158:53616
(size: 1856.0 B, free: 14.8 GB)
> 16/05/16 12:17:49 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1, runningTasks:
1
> 16/05/16 12:17:50 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1, runningTasks:
1
> 16/05/16 12:17:50 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1, runningTasks:
0
> 16/05/16 12:17:50 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 2156
ms on xxx3 (1/1)
> 16/05/16 12:17:50 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed,
from pool 
> 16/05/16 12:17:50 INFO DAGScheduler: ResultStage 1 (collect at <stdin>:1) finished
in 2.158 s
> 16/05/16 12:17:50 DEBUG DAGScheduler: After removal of stage 1, remaining stages = 0
> 16/05/16 12:17:50 INFO DAGScheduler: Job 1 finished: collect at <stdin>:1, took
2.174808 s
> {code}
> I can't see any information on Hive connection in this trace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message