spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang-Chi Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-6354) Replace the plan which is part of cached query
Date Thu, 19 Mar 2015 07:28:38 GMT

    [ https://issues.apache.org/jira/browse/SPARK-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368649#comment-14368649
] 

Liang-Chi Hsieh edited comment on SPARK-6354 at 3/19/15 7:28 AM:
-----------------------------------------------------------------

h2. Introduction

Currently we use the cached data in SparkSQL by looking for fully the same logical plan. The
logic is implemented in {code}CacheManager.useCachedData{code}. If we find a cached version
of the logical plan, we can replace it with the cached version.

This ticker expands the approach and looks for the the logical plan that contains all output
of the given logical plan. If we find a cached plan satisfying this condition, we can replace
it with the cached version.

h2. Current approach

The comparison logic is in {code}LogicalPlan.sameResult{code}. To have two logical plans considered
the same one, it should satisfy few conditions:

# They are the same class
# Their children sizes are the same
# Their cleanArgs are the same
# All their children satisfying the conditions above

h2. Proposed approach

This ticker wants to expand the current approach. The expanded approach uses the cached data
by looking for the logical plan that is superset of current logical plan. In other words,
the current logical plan will return part of the results of the cached plan.

The comparison logic is in {code}LogicalPlan.partResult{code}. It has a parameter {code}plan:
LogicalPlan{code}. To have the given {code}plan{code} considered the part of another logical
plan (called {code}this plan{code} below), it should also satisfy few conditions:

# They are the same class
# Their children sizes are the same
# The cleanArgs of given {code}plan{code} are contained in {code}this plan{code}
# All their children satisfying the conditions above

In order to test if the condition 3 is satisfied, we iterate through the elements in the cleanArgs
of the given {code}plan{code}. For each element, we check if the cleanArgs of {code}this plan{code}
contains it. If any element is not, the condition 3 is failed.

Basically, the proposed approach just relaxes one of the previous conditions. Previously,
the condition requires that two plans are having the same args. Now, it is modified to only
require that the args in the given plan are all contained in another plan. If the condition
is met, the given plan is the part of the another plan.



was (Author: viirya):
h2. Introduction

Currently we use the cached data in SparkSQL by looking for fully the same logical plan. The
logic is implemented in {code}CacheManager.useCachedData{code}. If we find a cached version
of the logical plan, we can replace it with the cached version.

This ticker expands the approach and looks for the the logical plan that contains all output
of the given logical plan. If we find a cached plan satisfying this condition, we can replace
it with the cached version.

h2. Current approach

The comparison logic is in {code}LogicalPlan.sameResult{code}. To have two logical plans considered
the same one, it should satisfy few conditions:

# They are the same class
# Their children sizes are the same
# Their cleanArgs are the same
# All their children satisfying the conditions above

h2. Proposed approach

This ticker wants to expand the current approach. The expanded approach uses the cached data
by looking for the logical plan that is superset of current logical plan. In other words,
the current logical plan will return part of the results of the cached plan.

The comparison logic is in {code}LogicalPlan.partResult{code}. It has a parameter {code}plan:
LogicalPlan{code}. To have the given {code}plan{code} considered the part of another logical
plan (called {code}this plan{code} below), it should also satisfy few conditions:

# They are the same class
# Their children sizes are the same
# The cleanArgs of given {code}plan{code] are contained in {code}this plan{code}
# All their children satisfying the conditions above

In order to test if the condition 3 is satisfied, we iterate through the elements in the cleanArgs
of the given {code}plan{code}. For each element, we check if the cleanArgs of {code}this plan{code}
contains it. If any element is not, the condition 3 is failed.

Basically, the proposed approach just relaxes one of the previous conditions. Previously,
the condition requires that two plans are having the same args. Now, it is modified to only
require that the args in the given plan are all contained in another plan. If the condition
is met, the given plan is the part of the another plan.


> Replace the plan which is part of cached query
> ----------------------------------------------
>
>                 Key: SPARK-6354
>                 URL: https://issues.apache.org/jira/browse/SPARK-6354
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Liang-Chi Hsieh
>            Priority: Minor
>
> Currently we only replace the plan which equals to cached query. This approach can be
extended to replace the plan which is part of cached query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message