spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gen tang <gen.tan...@gmail.com>
Subject Fwd: dataframe slow down with tungsten turn on
Date Thu, 05 Nov 2015 04:43:00 GMT
Hi,

In fact, I tested the same code with spark 1.5 with tungsten turning off.
The result is quite the same as tungsten turning on.
It seems that it is not the problem of tungsten, it is simply that spark
1.5 is slower than spark 1.4.

Is there any idea about why it happens?
Thanks a lot in advance

Cheers
Gen


---------- Forwarded message ----------
From: gen tang <gen.tang86@gmail.com>
Date: Wed, Nov 4, 2015 at 3:54 PM
Subject: dataframe slow down with tungsten turn on
To: "user@spark.apache.org" <user@spark.apache.org>


Hi sparkers,

I am using dataframe to do some large ETL jobs.
More precisely, I create dataframe from HIVE table and do some operations.
And then I save it as json.

When I used spark-1.4.1, the whole process is quite fast, about 1 mins.
However, when I use the same code with spark-1.5.1(with tungsten turn on),
it takes a about 2 hours to finish the same job.

I checked the detail of tasks, almost all the time is consumed by
computation.

Any idea about why this happens?

Thanks a lot in advance for your help.

Cheers
Gen

Mime
View raw message