spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-22046) Streaming State cannot be scalable
Date Mon, 18 Sep 2017 07:05:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-22046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen updated SPARK-22046:
------------------------------
    Issue Type: Improvement  (was: Bug)

> Streaming State cannot be scalable
> ----------------------------------
>
>                 Key: SPARK-22046
>                 URL: https://issues.apache.org/jira/browse/SPARK-22046
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>         Environment: OS: amazon linux, 
> Streaming Source: kafka 0.10
> vm: aws ec2
> cluster resources: 16Gb per worker, single executor per worker, 8 cores per executor
> storage: hdfs
>            Reporter: danny mor
>            Priority: Minor
>
> State cannot be distributed on the cluster.
> When the {color:#59afe1}StateStoreRDD{color}'s {color:#59afe1}getPrefferedLocation {color}is
called it 
> creates a {color:#59afe1} StateStoreId(checkpointLocation, operatorId, partition.index){color},
> send it to the {color:#59afe1}StateStoreCoordinator {color},which holds a hashmap of
{color:#59afe1}StateStoreId {color}to {color:#59afe1}ExecutorCacheTaskLocation{color}, and
returns the executorId if it is cached.
> the operatorId is generated once every batch in the {color:#59afe1}IncrementalExecution
{color}instance
> but it is almost always 0 since {color:#59afe1}IncrementalExecution {color}is instantiated
each batch
> the partition index is limited to the configured value {color:#14892c}"spark.sql.shuffle.partitions"{color}
(in my case the default 200)
> so this limits cache to 200 entries which has no regard to the key itself .
> When introducing new Executors to the cluster and new keys to streaming data, it does
not effect the distribution of state because the {color:#59afe1}StateStoreId {color}does not
regard those variables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message