drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parth Chandra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6223) Drill fails on Schema changes
Date Mon, 02 Apr 2018 09:18:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422042#comment-16422042

Parth Chandra commented on DRILL-6223:

{quote}To your point about compensation logic in the context of Schema Changes
{quote} * 
{quote}Why do you think it is ok to dynamically include new columns?{quote}
{quote}Yet it is not ok to exclude them?{quote}

Usually, in real world data with dynamically changing schema's, new columns are added and
not removed. 
{quote}Consider a batch of 32k rows{quote}
{quote}A VV with null integer values will require 32kb (bits) + 32kb * 4 = 160kb{quote}
{quote}Each missing column will require that much memory per mini-fragment{quote}

One of the guarantees provided by value vectors is that elements can be accessed by index
in constant time (or, in the case of nested elements in O(m) where m is the level of nesting)
. The representation is based on providing this guarantee. It comes at the cost of additional
memory usage, which is a deliberate tradeoff.
{quote}This is unless (similarly to the implicit columns) we optimize the VV storage representation
or / and push the column preservation to higher layers such as the client or foreman
It would be wonderful to improve vectors to use much less memory while providing the same
guarantees. A proposal would be welcome. 


> Drill fails on Schema changes 
> ------------------------------
>                 Key: DRILL-6223
>                 URL: https://issues.apache.org/jira/browse/DRILL-6223
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0, 1.12.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>             Fix For: 1.14.0
> Drill Query Failing when selecting all columns from a Complex Nested Data File (Parquet)
Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within nested data
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor fragments are
involved (concurrency higher than one)

This message was sent by Atlassian JIRA

View raw message