spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)
Date Mon, 12 Feb 2018 13:37:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360757#comment-16360757
] 

Marcelo Vanzin edited comment on SPARK-23394 at 2/12/18 1:36 PM:
-----------------------------------------------------------------

I talked to Attila offline, and to me it seems like the new UI is more correct. There are
only 10 cached partitions, each one replicated to 2 executors; the table also reflects that
(whereas   the old UI shows the same block twice). The only potential adjustment here would
be to show the executor addresses instead of the executor IDs.

In the context of what lead us here (SPARK-20659 / https://github.com/apache/spark/pull/20546#discussion_r167070392),
I think that we should fix the tests that rely on the old code returning the total count including
replication, so that they work with the new code that returns more accurate information.


was (Author: vanzin):
I talked to Attila offline, and to me it seems like the new UI is more correct. There are
only 10 cached partitions, each one replicated to 2 executors; the table also reflects that
(instead of the old UI, where the same block showed up twice). The only potential adjustment
here would be to show the executor addresses instead of the executor IDs.

In the context of what lead us here (SPARK-20659 / https://github.com/apache/spark/pull/20546#discussion_r167070392),
I think that we should fix the tests that rely on the old code returning the total count including
replication, so that they work with the new code that returns more accurate information.

> Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo
does)
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23394
>                 URL: https://issues.apache.org/jira/browse/SPARK-23394
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.0
>            Reporter: Attila Zsolt Piros
>            Priority: Major
>         Attachments: Spark_2.2.1.png, Spark_2.4.0-SNAPSHOT.png, Storage_Tab.png
>
>
> Start spark as:
> {code:bash}
> $ bin/spark-shell --master local-cluster[2,1,1024]
> {code}
> {code:scala}
> scala> import org.apache.spark.storage.StorageLevel._
> import org.apache.spark.storage.StorageLevel._
> scala> sc.parallelize((1 to 100), 10).persist(MEMORY_AND_DISK_2).count
> res0: Long = 100                                                                
> scala> sc.getRDDStorageInfo(0).numCachedPartitions
> res1: Int = 20
> {code}
> h2. Cached Partitions 
> On the UI at the Storage tab Cached Partitions is 10:
>  !Storage_Tab.png! .
> h2. Full tab
> Moreover the replicated partitions was also listed on the old 2.2.1 like:
>  !Spark_2.2.1.png! 
> But now it is like:
>  !Spark_2.4.0-SNAPSHOT.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message