spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: PySpark - Hive Context Does Not Return Results but SQL Context Does for Similar Query.
Date Wed, 14 Oct 2015 21:30:07 GMT
I forgot to add.  You might also try running: SET
spark.sql.hive.metastorePartitionPruning=true

On Wed, Oct 14, 2015 at 2:23 PM, Michael Armbrust <michael@databricks.com>
wrote:

> No link to the original stack overflow so I can up my reputation? :)
>
> This is likely not a difference between HiveContext/SQLContext, but
> instead a difference between a table where the metadata is coming from the
> HiveMetastore vs the SparkSQL Data Source API.  I would guess that if you
> create the table the same way, the performance would be similar.
>
> In the data source API we have spent a fair amount of time optimizing the
> discovery and handling of many partitions, and in general I would say this
> path is easier to use / faster.
>
> Likely the problem with the hive table, is downloading all of the
> partition metadata from the metastore and converting it to our internal
> format.  We do this for all partitions, even though in this case you only
> want the first ~20 rows.
>
> On Wed, Oct 14, 2015 at 1:38 PM, charles.drotar <
> charles.drotar@capitalone.com> wrote:
>
>> I have duplicated my submission to stack overflow below since it is
>> exactly
>> the same question I would like to post here as well. Please don't judge me
>> too harshly for my laziness
>>
>> <
>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n25067/Screen_Shot_2015-10-14_at_3.png
>> >
>>
>> *The questions I am concerned with are the same ones listed in the
>> "QUESTIONS" section namely:*
>>
>> */1) Has anyone noticed anything similar to this?
>> 2) What is happening on the backend that could be causing this consumption
>> of resources and what could I do to avoid it?/*
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-Hive-Context-Does-Not-Return-Results-but-SQL-Context-Does-for-Similar-Query-tp25067.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message