drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
Date Wed, 17 May 2017 22:40:05 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014874#comment-16014874
] 

ASF GitHub Bot commented on DRILL-5504:
---------------------------------------

Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/832#discussion_r116094668
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java
---
    @@ -0,0 +1,205 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + ******************************************************************************/
    +package org.apache.drill.exec.physical.impl.validate;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.exec.record.SimpleVectorWrapper;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.vector.BaseDataValueVector;
    +import org.apache.drill.exec.vector.FixedWidthVector;
    +import org.apache.drill.exec.vector.NullableVarCharVector;
    +import org.apache.drill.exec.vector.NullableVector;
    +import org.apache.drill.exec.vector.RepeatedVarCharVector;
    +import org.apache.drill.exec.vector.UInt4Vector;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.VarCharVector;
    +import org.apache.drill.exec.vector.VariableWidthVector;
    +import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector;
    +import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike;
    +
    +
    +/**
    + * Validate a batch of value vectors. It is not possible to validate the
    + * data, but we can validate the structure, especially offset vectors.
    + * Only handles single (non-hyper) vectors at present. Current form is
    + * self-contained. Better checks can be done by moving checks inside
    + * vectors or by exposing more metadata from vectors.
    + */
    +
    +public class BatchValidator {
    +  private static final org.slf4j.Logger logger =
    +      org.slf4j.LoggerFactory.getLogger(BatchValidator.class);
    +
    +  public static final int MAX_ERRORS = 100;
    +
    +  private final int rowCount;
    +  private final VectorAccessible batch;
    +  private final List<String> errorList;
    +  private int errorCount;
    +
    +  public BatchValidator(VectorAccessible batch) {
    +    rowCount = batch.getRecordCount();
    +    this.batch = batch;
    +    errorList = null;
    +  }
    +
    +  public BatchValidator(VectorAccessible batch, boolean captureErrors) {
    +    rowCount = batch.getRecordCount();
    +    this.batch = batch;
    +    if (captureErrors) {
    +      errorList = new ArrayList<>();
    +    } else {
    +      errorList = null;
    +    }
    +  }
    +
    +  public void validate() {
    --- End diff --
    
    Just a thought. Is there a way to enable these checks (and fail if invalid) for pre-commit
tests as well?


> Vector validator to diagnose offset vector issues
> -------------------------------------------------
>
>                 Key: DRILL-5504
>                 URL: https://issues.apache.org/jira/browse/DRILL-5504
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become corrupted,
yielding a bogus field-length value that is orders of magnitude larger than the vector that
contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a "vector validator"
that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator validator
batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message