spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvio Fiorito <>
Subject Re: Using HiveContext.set in multipul threads
Date Tue, 24 May 2016 12:11:51 GMT
If you’re using DataFrame API you can achieve that by simply using (or not) the “partitionBy”
method on the DataFrameWriter:

val originalDf = ….

val df1 = originalDf….
val df2 = originalDf…


From: Amir Gershman <>
Date: Tuesday, May 24, 2016 at 7:01 AM
To: "" <>
Subject: Using HiveContext.set in multipul threads


I have a DataFrame I compute from a long chain of transformations.
I cache it, and then perform two additional transformations on it.
I use two Futures - each Future will insert the content of one of the above Dataframe to a
different hive table.
One Future must SET hive.exec.dynamic.partition=true and the other must set it to false.

How can I run both INSERT commands in parallel, but guarantee each runs with its own settings?

If I don't use the same HiveContext then the initial long chain of transformations which I
cache is not reusable between HiveContexts. If I use the same HiveContext, race conditions
between threads my cause one INSERT to execute with the wrong config.

View raw message