drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arina Ielchiieva (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
Date Thu, 23 Nov 2017 13:25:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264320#comment-16264320
] 

Arina Ielchiieva commented on DRILL-5771:
-----------------------------------------

Merged into Apache master with commit id 7506cfbb5c8522d371c12dbdc2268d48a9449a48

> Fix serDe errors for format plugins
> -----------------------------------
>
>                 Key: DRILL-5771
>                 URL: https://issues.apache.org/jira/browse/DRILL-5771
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>              Labels: ready-to-commit
>             Fix For: 1.12.0
>
>
> Create unit tests to check that all storage format plugins can be successfully serialized
 / deserialized.
> Usually this happens when query has several major fragments. 
> One way to check serde is to generate physical plan (generated as json) and then submit
it back to Drill.
> One example of found errors is described in the first comment. Another example is described
in DRILL-5166.
> *Serde issues:*
> 1. Could not obtain format plugin during deserialization
> Format plugin is created based on format plugin configuration or its name. 
> On Drill start up we load information about available plugins (its reloaded each time
storage plugin is updated, can be done only by admin).
> When query is parsed, we try to get plugin from the available ones, it we can not find
one we try to [create one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144]
> but on other query execution stages we always assume that [plugin exists based on configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162].
> For example, during query parsing we had to create format plugin on one node based on
format configuration.
> Then we have sent major fragment to the different node where we used this format configuration
we could not get format plugin based on it and deserialization has failed.
> To fix this problem we need to create format plugin during query deserialization if it's
absent.
>   
> 2.  Absent hash code and equals.
> Format plugins are stored in hash map where key is format plugin config.
> Since some format plugin configs did not have overridden hash code and equals, we could
not find format plugin based on its configuration.
> 3. Named format plugin usage
> Named format plugins configs allow to get format plugin by its name for configuration
shared among all drillbits.
> They are used as alias for pre-configured format plugiins. User with admin priliges can
modify them at runtime.
> Named format plugins configs are used instead of sending all non-default parameters of
format plugin config, in this case only name is sent.
> Their usage in distributed system may cause raise conditions.
> For example, 
> 1. Query is submitted. 
> 2. Parquet format plugin is created with the following configuration (autoCorrectCorruptDates=>true).
> 3. Seralized named format plugin config with name as parquet.
> 4. Major fragment is sent to the different node.
> 5. Admin has changed parquet configuration for the alias 'parquet' on all nodes to autoCorrectCorruptDates=>false.
> 6. Named format is deserialized on the different node into parquet format plugin with
configuration (autoCorrectCorruptDates=>false).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message