spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ssonker <...@git.apache.org>
Subject [GitHub] spark pull request #21505: [SPARK-24457][SQL] Improving performance of strin...
Date Mon, 11 Jun 2018 06:57:15 GMT
Github user ssonker commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21505#discussion_r194305723
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
---
    @@ -111,6 +113,23 @@ object DateTimeUtils {
         computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone)
       }
     
    +  private val threadLocalComputedCalendarsMap =
    +    new ThreadLocal[mutable.Map[TimeZone, Calendar]] {
    --- End diff --
    
    @kiszk @viirya I've tried running benchmarks for with/without ```mutable.Map``` implementation.
Looks like setting timezone in a calendar instance is a costly operation and it drags the
performance down. As the number of timezones cannot be large, maintaining a map will not be
a huge memory overhead. So, I suggest going with the ```mutable.Map``` approach. Comments?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message