From user-return-565-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Sun Jul 26 17:00:32 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mailroute1-lw-us.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 1134B18061A for ; Sun, 26 Jul 2020 19:00:32 +0200 (CEST) Received: from mail.apache.org (localhost [127.0.0.1]) by mailroute1-lw-us.apache.org (ASF Mail Server at mailroute1-lw-us.apache.org) with SMTP id B03F7124E65 for ; Sun, 26 Jul 2020 17:00:30 +0000 (UTC) Received: (qmail 13281 invoked by uid 500); 26 Jul 2020 17:00:30 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 13271 invoked by uid 99); 26 Jul 2020 17:00:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Jul 2020 17:00:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 8ACD5C014A for ; Sun, 26 Jul 2020 17:00:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.203 X-Spam-Level: X-Spam-Status: No, score=0.203 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=techascent-com.20150623.gappssmtp.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id uTamKL6hUHK9 for ; Sun, 26 Jul 2020 17:00:28 +0000 (UTC) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::635; helo=mail-ej1-x635.google.com; envelope-from=chris@techascent.com; receiver= Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com [IPv6:2a00:1450:4864:20::635]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id BA0D67E1AF for ; Sun, 26 Jul 2020 17:00:27 +0000 (UTC) Received: by mail-ej1-x635.google.com with SMTP id g19so887880ejc.9 for ; Sun, 26 Jul 2020 10:00:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=techascent-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=yXT2wt7rQgenzkQIu/spQi5KFbOgRT6FJljP+LgyYVA=; b=w2MHENbJF32l1MruAXv7y/fa8xVhv47KiPuqtSYnBJzmh0Ng0yns5+TXTj4eXHcIeL +xctlTKhRl1lC3mhOz365/rigU0Ed/aPC0LlvQ3N6HhZa9fOmp5lvWjyVi1mapfxw44q 3TpghGmof3A4JEYxQwUD+PafZqeeuRuRx6t5b9URpP+gbuaj87+R1QEuW3GctO7aKI9b tbS8bViL+GAhCsNgc1baF64SdnGupu+3CsaFOJSYL9RDftc3ES61TuICz90k33itUO7Z +EbpIrJBcqR0TFQjSfwRPTQMv1JMylRrGL/eaqy9OExB9z9N0cPAIOfgcP2Me/cVrtbG T1eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=yXT2wt7rQgenzkQIu/spQi5KFbOgRT6FJljP+LgyYVA=; b=bmblHKwzCbpq3qRbcO5sXM1DakW+9CzGwzy5q81wZtoex3hGqOy8i8Z9qIZ74gi3qg g9bo/X8GSmk9tyCn3DVg7rjnrQxqUP6p9agZGbsMBxCRfKGqtp39MEaA+MCg/ewi//b+ 1mUv/MWf5We6cj8McxhgkFpsNp1/jr0xhA38xyeRx0tRLWO7AxBZXxxh8eQwhKu8Podj CcMwY5Z/YyxqLDvqyjN9Bdbh+fUdqoyCbco9v0THQp1ARVq1gCQxX/UmMvA7yrvkJEm7 0V63NRkiR7HKyZfxoySO2zTFQPHtI4AWuoaH66dWlpaXENcsmk7eYbcxeDnOkxdpIzwW ch0Q== X-Gm-Message-State: AOAM531We0PWB9fL4Kg/bE+heYvlYtBYG0BQPbNLAahryurEERIvN/0J 0uhcbZz+1jG6AHEoUp0mJs/KuKqKHyROYW0QG3+zKh/k X-Google-Smtp-Source: ABdhPJwbO9Us+SdLrtTTA8b/X4BxY/jHLxabGhZ/BdcjwbOCdXSrEPJWgy/UnvkhpfEgsVtVwgUwIoqPqDg+PfckcNw= X-Received: by 2002:a17:906:8d5:: with SMTP id o21mr18743507eje.155.1595782821056; Sun, 26 Jul 2020 10:00:21 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Chris Nuernberger Date: Sun, 26 Jul 2020 11:00:10 -0600 Message-ID: Subject: Re: Bulk copy methods to/from Java vectors To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="000000000000302a7d05ab5b2454" --000000000000302a7d05ab5b2454 Content-Type: text/plain; charset="UTF-8" Make sense also, thank you for your help. Calling setValueCount after allocateNew solved the problem apparently at least allowing me to round-trip some data. I never called setLastSet once so perhaps there was some duplicate work done. On Sun, Jul 26, 2020 at 10:30 AM Jacques Nadeau wrote: > > On Sun, Jul 26, 2020 at 8:02 AM Chris Nuernberger > wrote: > >> It appears that those methods do not allocate the validity buffer *and* >> the function `allocateValidityBuffer` is private. >> > > It allocates both of them at once. To reduce heap usage we colocate them > since they are never resized indepently. > > > Also it appears that allocate new fails to set the value count for >> BaseVariableWidthVectors. And if you set the value count after you have >> assigned data then it clears *only* the offset buffer but not the validity >> or the data buffers. > > > For direct operations on variable, you'll need to do the following steps: > 1) allocateNew, > 2) copy in data via memory operations, > 3) call setLastSet() > 4) call setValueCount() > > I'm guessing you skipped #3 and then setValueCount sees that you never set > any values so it propagates the the last offset to the value count. This is > done so you can do something like: > set(1,...) > set(3,...) > setValueCount(7) > and then 4-6 ordinal positions will be offset filled even though you > didn't set them explicitly. If you do your own work, you have to help the > state model in the variable vector understand what you've done. > --000000000000302a7d05ab5b2454 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Make sense also, thank you for your help.=C2=A0 Calling se= tValueCount after allocateNew solved the problem apparently at least allowi= ng me to round-trip some data.=C2=A0 I never called setLastSet once so perh= aps there was some duplicate=C2=A0work done.

On Sun, Jul 26, 2020 at 10:30 A= M Jacques Nadeau <jacques@apache.o= rg> wrote:

On Sun, Jul 26, 2020 at 8:02 AM Chris N= uernberger <ch= ris@techascent.com> wrote:
It appears that those methods do not all= ocate=C2=A0the validity buffer *and* the function `allocateValidityBuffer` = is private.

It allocates both of them= at once. To reduce heap usage we colocate them since they are never resize= d indepently.


Also it appears that allocate new fails to set the = value count for BaseVariableWidthVectors.=C2=A0 And if you set the value co= unt after you have assigned data then it clears *only* the offset buffer bu= t not the validity or the data buffers.

For= direct operations on variable, you'll need to do the following steps:= =C2=A0
1) allocateNew,=C2=A0
2) copy in data via memory= operations,=C2=A0
3) call setLastSet()=C2=A0
4) call s= etValueCount()

I'm guessing you skipped #3 and= then setValueCount sees that you never set any values so it propagates the= the last offset to the value count. This is done so you can do something l= ike:
set(1,...)
set(3,...)
setValueCount(= 7)
and then 4-6 ordinal positions will be offset filled even thou= gh you didn't set them explicitly. If you do your own work, you have to= help the state model in the variable vector understand what you've don= e.
--000000000000302a7d05ab5b2454--