Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 092C8200C63 for ; Thu, 11 May 2017 21:47:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 07CC8160BB3; Thu, 11 May 2017 19:47:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4CDE1160BC7 for ; Thu, 11 May 2017 21:47:10 +0200 (CEST) Received: (qmail 51321 invoked by uid 500); 11 May 2017 19:47:09 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 51311 invoked by uid 99); 11 May 2017 19:47:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 May 2017 19:47:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 17332185E85 for ; Thu, 11 May 2017 19:47:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id WZGRS4c9lxpE for ; Thu, 11 May 2017 19:47:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id F26845F523 for ; Thu, 11 May 2017 19:47:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 10B11E0DB1 for ; Thu, 11 May 2017 19:47:06 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id DD3D321E1A for ; Thu, 11 May 2017 19:47:04 +0000 (UTC) Date: Thu, 11 May 2017 19:47:04 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 11 May 2017 19:47:11 -0000 [ https://issues.apache.org/jira/browse/DRILL-5504?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1600= 7072#comment-16007072 ]=20 ASF GitHub Bot commented on DRILL-5504: --------------------------------------- GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/832 DRILL-5504: Vector validator to diagnose offset vector issues Validates offset vectors in VarChar and repeated vectors. Validates the special case of repeated VarChar vectors (two layers of offsets.) =20 Provides two new session variables to turn on validation. One enables the existing operator (iterator) validation, the other adds vector validation. This allows validation to occur in a =E2=80=9Cproduction=E2= =80=9D Drill (without restarting Drill with assertions, as previously required.) =20 Unit tests validate the validator. Another test validates the integration, but requires manual steps, so is ignored by default. =20 This version is first-cut: all work is done within a single class. Allows back-porting to an earlier version to solve a specific issues. A revision should move some of the work into generated code (or refactor vectors to allow outside access), since offset vectors appear for each subclass; not on a base class that would allow generic operations. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5504 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/832.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #832 =20 ---- commit 175e592419ca6bda1fd0259cc42b033616facc3d Author: Paul Rogers Date: 2017-05-11T19:46:15Z DRILL-5504: Vector validator to diagnose offset vector issues =20 Validates offset vectors in VarChar and repeated vectors. Validates the special case of repeated VarChar vectors (two layers of offsets.) =20 Provides two new session variables to turn on validation. One enables the existing operator (iterator) validation, the other adds vector validation. This allows validation to occur in a =E2=80=9Cproduction=E2= =80=9D Drill (without restarting Drill with assertions, as previously required.) =20 Unit tests validate the validator. Another test validates the integration, but requires manual steps, so is ignored by default. =20 This version is first-cut: all work is done within a single class. Allows back-porting to an earlier version to solve a specific issues. A revision should move some of the work into generated code (or refactor vectors to allow outside access), since offset vectors appear for each subclass; not on a base class that would allow generic operations. ---- > Vector validator to diagnose offset vector issues > ------------------------------------------------- > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.10.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have bec= ome corrupted, yielding a bogus field-length value that is orders of magnit= ude larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a= "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterato= r validator batch iterator" to optionally allow vector validation on each b= atch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)