drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6223) Drill fails on Schema changes
Date Fri, 30 Mar 2018 17:09:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420711#comment-16420711
] 

ASF GitHub Bot commented on DRILL-6223:
---------------------------------------

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1170
  
    BTW: thanks for tackling such a difficult, core issue in Drill. Drill claims to be a)
schema free and b) SQL compliant. SQL is based on operations over relations with a fixed number
of columns of fixed types. Reconciling these two ideas is very difficult. Even the original
Drill developers, who built a huge amount of code very quickly, and who had intimate knowledge
of the Drill internals, even they did not find a good solution which is why the problem is
still open.
    
    There are two obvious approaches: 1) redefine SQL to operate over lists of maps (with
arbitrary name/value pairs that differ across rows), or 2) define translation rules from schema-free
input into the schema-full relations that SQL requires.
    
    This PR attempts to go down the first route: redefine SQL. To be successful, we'd want
to rely on research papers, if any, that show how to reformulate relational theory on top
of lists of maps rather than on relations and domains.
    
    The other approach is to define conversion rules: something much more on the order of
a straight-forward implementation project. Can the user provide conversion rules (in the form
of a schema) when the conversion is ambiguous? Would users rather encounter schema change
exceptions or provide the conversion rules? These are interesting open questions.


> Drill fails on Schema changes 
> ------------------------------
>
>                 Key: DRILL-6223
>                 URL: https://issues.apache.org/jira/browse/DRILL-6223
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0, 1.12.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data File (Parquet)
Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within nested data
types
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor fragments are
involved (concurrency higher than one)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message