drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
Date Thu, 11 May 2017 19:47:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007072#comment-16007072
] 

ASF GitHub Bot commented on DRILL-5504:
---------------------------------------

GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/832

    DRILL-5504: Vector validator to diagnose offset vector issues

    Validates offset vectors in VarChar and repeated vectors. Validates the
    special case of repeated VarChar vectors (two layers of offsets.)
    
    Provides two new session variables to turn on validation. One enables
    the existing operator (iterator) validation, the other adds vector
    validation. This allows validation to occur in a “production” Drill
    (without restarting Drill with assertions, as previously required.)
    
    Unit tests validate the validator. Another test validates the
    integration, but requires manual steps, so is ignored by default.
    
    This version is first-cut: all work is done within a single class.
    Allows back-porting to an earlier version to solve a specific issues. A
    revision should move some of the work into generated code (or refactor
    vectors to allow outside access), since offset vectors appear for each
    subclass; not on a base class that would allow generic operations.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-5504

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/832.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #832
    
----
commit 175e592419ca6bda1fd0259cc42b033616facc3d
Author: Paul Rogers <progers@maprtech.com>
Date:   2017-05-11T19:46:15Z

    DRILL-5504: Vector validator to diagnose offset vector issues
    
    Validates offset vectors in VarChar and repeated vectors. Validates the
    special case of repeated VarChar vectors (two layers of offsets.)
    
    Provides two new session variables to turn on validation. One enables
    the existing operator (iterator) validation, the other adds vector
    validation. This allows validation to occur in a “production” Drill
    (without restarting Drill with assertions, as previously required.)
    
    Unit tests validate the validator. Another test validates the
    integration, but requires manual steps, so is ignored by default.
    
    This version is first-cut: all work is done within a single class.
    Allows back-porting to an earlier version to solve a specific issues. A
    revision should move some of the work into generated code (or refactor
    vectors to allow outside access), since offset vectors appear for each
    subclass; not on a base class that would allow generic operations.

----


> Vector validator to diagnose offset vector issues
> -------------------------------------------------
>
>                 Key: DRILL-5504
>                 URL: https://issues.apache.org/jira/browse/DRILL-5504
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become corrupted,
yielding a bogus field-length value that is orders of magnitude larger than the vector that
contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a "vector validator"
that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator validator
batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message