drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5328) Trim down physical plan size - replace StoragePluginConfig with storage name
Date Tue, 07 Mar 2017 23:59:38 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900415#comment-15900415
] 

Paul Rogers commented on DRILL-5328:
------------------------------------

Careful not to introduce race conditions.

* Submit plan onto Drillbit A, with one storage plugin config.
* At the same time, alter the storage plugin on Drillbit B.

The query is planned, and will execute, on Drillbit A based on the old config. The query will
execute on Drillbit B with the new config.

Eventually, Drillbit A will learn of the changed config, but it takes time. This creates a
race condition between the query submission and the plugin updates.

This issue is very similar to the issue that arrises in Dynamic UDFs. We've had trouble getting
the design right. We plan to move to an MVCC model to finally resolve all the race conditions.

MVCC could be used here as well. Store storage plugin configs as versions. Now, the above
race condition is resolved:

* Query is planned on Drillbit A using version 17 of, say, "dfs."
* User modifies plugins on Drillbit B to create version 18.
* When the query executes on Drillbit B, it uses (dfs, 17), and so uses the same version of
the information with which it was planned.

> Trim down physical plan size - replace StoragePluginConfig with storage name
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-5328
>                 URL: https://issues.apache.org/jira/browse/DRILL-5328
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Chunhui Shi
>
> For a physical plan, we now pass StoragePluginConfig as part of plan, then the destination
use the config to fetch the storage plugin in StoragePluginRegistry. However, we can also
fetch a storage plugin with the name which is identical to all Drillbits. 
> In the example of simple physical plan of 150 lines shown below,  the storage plugin
config took 60 lines. In a typical large system, FileSystem's StoragePluginConfig could be
>500 lines. So this improvement should save the cost of passing a larger physical plan
among nodes.
> 0: jdbc:drill:zk=10.10.88.126:5181> explain plan for select * from dfs.tmp.employee1
where last_name='Blumberg';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(*=[$0])
> 00-02        Project(T1¦¦*=[$0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($1, 'Blumberg')])
> 00-05              Project(T1¦¦*=[$0], last_name=[$1])
> 00-06                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/employee1/0_0_0.parquet]],
selectionRoot=/tmp/employee1, numFiles=1, usedMetadataFile=true, cacheFileRoot=/tmp/employee1,
columns=[`*`]]])
>  | {
>   "head" : {
>     "version" : 1,
>     "generator" : {
>       "type" : "ExplainHandler",
>       "info" : ""
>     },
>     "type" : "APACHE_DRILL_PHYSICAL",
>     "options" : [ ],
>     "queue" : 0,
>     "resultMode" : "EXEC"
>   },
>   "graph" : [ {
>     "pop" : "parquet-scan",
>     "@id" : 6,
>     "userName" : "root",
>     "entries" : [ {
>       "path" : "/tmp/employee1/0_0_0.parquet"
>     } ],
>     "storage" : {
>       "type" : "file",
>       "enabled" : true,
>       "connection" : "maprfs:///",
>       "config" : null,
>       "workspaces" : {
>         "root" : {
>           "location" : "/",
>           "writable" : false,
>           "defaultInputFormat" : null
>         },
>         "tmp" : {
>           "location" : "/tmp",
>           "writable" : true,
>           "defaultInputFormat" : null
>         },
>         "shi" : {
>           "location" : "/user/shi",
>           "writable" : true,
>           "defaultInputFormat" : null
>         },
>         "dir700" : {
>           "location" : "/user/shi/dir700",
>           "writable" : true,
>           "defaultInputFormat" : null
>         },
>         "dir775" : {
>           "location" : "/user/shi/dir775",
>           "writable" : true,
>           "defaultInputFormat" : null
>         },
>         "xyz" : {
>           "location" : "/user/xyz",
>           "writable" : true,
>           "defaultInputFormat" : null
>         }
>       },
>       "formats" : {
>         "psv" : {
>           "type" : "text",
>           "extensions" : [ "tbl" ],
>           "delimiter" : "|"
>         },
>         "csv" : {
>           "type" : "text",
>           "extensions" : [ "csv" ],
>           "delimiter" : ","
>         },
>         "tsv" : {
>           "type" : "text",
>           "extensions" : [ "tsv" ],
>           "delimiter" : "\t"
>         },
>         "parquet" : {
>           "type" : "parquet"
>         },
>         "json" : {
>           "type" : "json",
>           "extensions" : [ "json" ]
>         },
>         "maprdb" : {
>           "type" : "maprdb"
>         }
>       }
>     },
>     "format" : {
>       "type" : "parquet"
>     },
>     "columns" : [ "`*`" ],
>     "selectionRoot" : "/tmp/employee1",
>     "filter" : "true",
>     "fileSet" : [ "/tmp/employee1/0_0_0.parquet" ],
>     "files" : [ "/tmp/employee1/0_0_0.parquet" ],
>     "cost" : 1155.0
>   }, {
>     "pop" : "project",
>     "@id" : 5,
>     "exprs" : [ {
>       "ref" : "`T1¦¦*`",
>       "expr" : "`*`"
>     }, {
>       "ref" : "`last_name`",
>       "expr" : "`last_name`"
>     } ],
>     "child" : 6,
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 1155.0
>   }, {
>     "pop" : "filter",
>     "@id" : 4,
>     "child" : 5,
>     "expr" : "equal(`last_name`, 'Blumberg') ",
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 173.25
>   }, {
>     "pop" : "selection-vector-remover",
>     "@id" : 3,
>     "child" : 4,
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 173.25
>   }, {
>     "pop" : "project",
>     "@id" : 2,
>     "exprs" : [ {
>       "ref" : "`T1¦¦*`",
>       "expr" : "`T1¦¦*`"
>     } ],
>     "child" : 3,
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 173.25
>   }, {
>     "pop" : "project",
>     "@id" : 1,
>     "exprs" : [ {
>       "ref" : "`*`",
>       "expr" : "`T1¦¦*`"
>     } ],
>     "child" : 2,
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 173.25
>   }, {
>     "pop" : "screen",
>     "@id" : 0,
>     "child" : 1,
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 173.25
>   } ]
> } |



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message