Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4EA4418E49 for ; Tue, 25 Aug 2015 22:42:03 +0000 (UTC) Received: (qmail 35536 invoked by uid 500); 25 Aug 2015 22:42:03 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 35472 invoked by uid 500); 25 Aug 2015 22:42:03 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 35461 invoked by uid 99); 25 Aug 2015 22:42:02 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Aug 2015 22:42:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 0DA52182343 for ; Tue, 25 Aug 2015 22:42:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.001 X-Spam-Level: *** X-Spam-Status: No, score=3.001 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Pz_Utvwzf2mi for ; Tue, 25 Aug 2015 22:41:49 +0000 (UTC) Received: from mail-lb0-f172.google.com (mail-lb0-f172.google.com [209.85.217.172]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 68E9E20864 for ; Tue, 25 Aug 2015 22:41:48 +0000 (UTC) Received: by lbcbn3 with SMTP id bn3so108927457lbc.2 for ; Tue, 25 Aug 2015 15:41:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=81dy7F3BZ1SiTByukkWCxRMEHOqkOiPpvs/T/BmUcWU=; b=ZwskiCmtTP/yvdQiw7mYbJe60pqrQgb3/igPxCFHl+0PZ6R+CSnNoj5TwFhrULMOMP 6DVuRpEMakmDtrBmhknx7A3gJe6OyQ0u2R0QI+gP3ViVC1jQrG6XWp0k9r5BilEwScZb iL6jbnwZgm2foMk97+xSd6at3bmne2BbKVhMfwO3B8CFizjcgHw3m/Nmny2xwGCIr3/j n+7pFlPkr/AItVmeR7vTaa1IhIiexAYeSb2ff3TnuoRzxmytxp76XRq6L98DHgIdZC3m H1kQrEjx7bcHUjBpYKfDHmyGnbnoVOxFWRhCSEoVfCvGYzMZTPwoQWs1UMSr2u/RycrU Aq7w== X-Gm-Message-State: ALoCoQmRAmCFO2urW78+UszDHjpAPA15zbZ4l1HAIM3ClCHR7spz5fcjWHyqioXYW4T5FVxFjenC MIME-Version: 1.0 X-Received: by 10.152.219.172 with SMTP id pp12mr27867255lac.17.1440542506818; Tue, 25 Aug 2015 15:41:46 -0700 (PDT) Received: by 10.25.161.11 with HTTP; Tue, 25 Aug 2015 15:41:46 -0700 (PDT) X-Originating-IP: [50.197.189.221] In-Reply-To: References: Date: Tue, 25 Aug 2015 15:41:46 -0700 Message-ID: Subject: Re: zeroVectors() interface for value vectors From: Jacques Nadeau To: dev@drill.apache.org Content-Type: multipart/alternative; boundary=001a1134328a673b8e051e2a7034 --001a1134328a673b8e051e2a7034 Content-Type: text/plain; charset=UTF-8 I think it would be worthwhile to research when this was added and why. I took a quick glance but hadn't figured out why this was introduced. At first glance, it seems like it was done to fix what is most likely a bug elsewhere. @Chris, agreed. The first goal was to at least get them documented in this thread. -- Jacques Nadeau CTO and Co-Founder, Dremio On Tue, Aug 25, 2015 at 10:46 AM, Abdel Hakim Deneche wrote: > One more question about the transition from allocate -> mutate. For Fixed > width vectors and BitVector you can actually call setSafe() without calling > allocateNew() first and it will work. Should it throw an exception instead > ? > not calling allocateNew() has side effects that could cause setSafe() to > throw an OversizedAllocationException if you call setSafe() then clear() > multiple times. > > On Tue, Aug 25, 2015 at 10:01 AM, Chris Westin > wrote: > > > Maybe we should start by putting these rules in a comment in the value > > vector base interfaces? The lack of such information is why there are > > deviations and other expectations. > > > > On Tue, Aug 25, 2015 at 8:22 AM, Jacques Nadeau > > wrote: > > > > > There are a few unspoken "rules" around vectors: > > > > > > - values need to be written in order (e.g. index 0, 1, 2, 5) > > > - null vectors start with all values as null before writing anything > > > - for variable width types, the offset vector should be all zeros > before > > > writing > > > - you must call setValueCount before a vector can be read > > > - you should never write to a vector once it has been read. > > > > > > The ultimate goal we should get to the point where you the interfaces > > > guarantee this order of operation: > > > > > > allocate > mutate > setvaluecount > access > clear (or allocate to > start > > > the process over, xxx). Any deviation from this pattern should result > in > > > exception. We should do this only in debug mode as this code is > > extremely > > > performance sensitive. Operations like transfer should be built on top > > of > > > this state model. (In that case, it would mean src moves to clear > state > > > and target moves to access state. It also means that transfer should > > only > > > work in access state.) > > > > > > If we need special purpose data structures that don't operate in these > > > ways, we should make sure to keep them separate rather than trying to > > > accommodate a deviation from this pattern in the core vector code. > > > > > > I wrote xxx above because I see the purpose of zeroVectors as being a > > reset > > > on the vector state back to the original state. Maybe we should > actually > > > call it 'reset' rather than 'zeroVectors'. This would basically pick > up > > at > > > mutate mode again. > > > > > > Since these rules were never formalized, I'm sure there are a few > places > > > where we currently deviate. We should enforce these rules and then get > > > those issues fixed. > > > > > > > > > > > > -- > > > Jacques Nadeau > > > CTO and Co-Founder, Dremio > > > > > > On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim Deneche < > > > adeneche@maprtech.com> > > > wrote: > > > > > > > Another important point to keep in mind here: > > ValueVectorWriteExpression > > > > operates under the "undocumented" assumption that the destination > > vector > > > is > > > > empty, this way it can safely skip writing null values. In the case > of > > > > window functions I am using a value vector as an internal buffer to > > hold > > > > values between batches which voids the assumption. > > > > If this assumption is indeed correct then adding zeroVector to value > > > > vectors is indeed the way to go. > > > > > > > > On Mon, Aug 24, 2015 at 8:30 PM, Jacques Nadeau > > > > wrote: > > > > > > > > > In general, let's try to avoid extending the core structures like > > value > > > > > vector read and write expressions for a single operator. Zerovector > > is > > > > > trivial to implement so let's resolve that way (trivial since the > > > > > underlying vector already has it and we just need to delegate > down). > > > > > On Aug 24, 2015 3:36 PM, "Aman Sinha" > wrote: > > > > > > > > > > > I am reviewing Hakim's patch for DRILL-3668 (first_value window > > > > function > > > > > > incorrect result). His code uses ValueVectorWriteExpression to > set > > > > > values > > > > > > in an internal batch which get re-used across different > partitions > > of > > > > the > > > > > > window function. Ideally, we just want to zero out the vector > > rather > > > > > than > > > > > > calling clear() since clear() will release the buffer. > > > > > > > > > > > > However, currently zeroVectors() is only supported by > > > FixedWidthVector, > > > > > not > > > > > > VariableWidthVector. * Should there be such an interface for > > > variable > > > > > > width ? * The implementation could zero out just the offset > vector. > > > > > > > > > > > > In the absence of such an interface, Hakim has added a boolean > flag > > > > > > witeNulls to ValueVectorWriteExpression (see > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d > > > > > > ) > > > > > > and is conditionally doing the null-ing out in the generated > code. > > > It > > > > > > won't affect the normal code path, it would get used for specific > > > > window > > > > > > functions. > > > > > > > > > > > > I am thinking of committing his patch and tracking the > > zeroVectors() > > > > > > enhancement separately (if people agree that it would be useful). > > > Let > > > > me > > > > > > know... > > > > > > > > > > > > Aman > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Abdelhakim Deneche > > > > > > > > Software Engineer > > > > > > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > < > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > -- > > Abdelhakim Deneche > > Software Engineer > > > > > Now Available - Free Hadoop On-Demand Training > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > --001a1134328a673b8e051e2a7034--