From user-return-701-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Thu Oct 8 04:18:24 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mailroute1-lw-us.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id B63BB180644 for ; Thu, 8 Oct 2020 06:18:24 +0200 (CEST) Received: from mail.apache.org (localhost [127.0.0.1]) by mailroute1-lw-us.apache.org (ASF Mail Server at mailroute1-lw-us.apache.org) with SMTP id E7AA91235DC for ; Thu, 8 Oct 2020 04:18:23 +0000 (UTC) Received: (qmail 14913 invoked by uid 500); 8 Oct 2020 04:18:23 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 14903 invoked by uid 99); 8 Oct 2020 04:18:23 -0000 Received: from spamproc1-he-de.apache.org (HELO spamproc1-he-de.apache.org) (116.203.196.100) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Oct 2020 04:18:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-de.apache.org (ASF Mail Server at spamproc1-he-de.apache.org) with ESMTP id A7A451FF3A3 for ; Thu, 8 Oct 2020 04:18:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-de.apache.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamproc1-he-de.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([116.203.227.195]) by localhost (spamproc1-he-de.apache.org [116.203.196.100]) (amavisd-new, port 10024) with ESMTP id fha6KHbLPFou for ; Thu, 8 Oct 2020 04:18:22 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::644; helo=mail-ej1-x644.google.com; envelope-from=emkornfield@gmail.com; receiver= Received: from mail-ej1-x644.google.com (mail-ej1-x644.google.com [IPv6:2a00:1450:4864:20::644]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 201607FB70 for ; Thu, 8 Oct 2020 04:18:22 +0000 (UTC) Received: by mail-ej1-x644.google.com with SMTP id ce10so6073477ejc.5 for ; Wed, 07 Oct 2020 21:18:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to; bh=t7BzEoe3t4dvowvRlgp7vYBuvDqtZTWVcmny08UXZcA=; b=aOkvI6ZEH/7PzYHrHF3Syzc2MTovOFpUzFRPs3Y+bb7AQOhVevk0FRWj+o+zrpoGac mSfI5X9xuqgqRuIqs0eoPE5K9LR0inCzHUioZutUN8/BYSoSi7IkF7QBHnxVCbKCvHMn M9gkpiQTjkC+YIzQpxJhFjbSpLdIWB4/+QdRvibRLZUMbg66sliVciFCxeTNVwWVnSvA /qTwd3M7nKKWBIsY6Rv64yKUMLO+Ho8ZHBvhqFQ5xQBIvQPypaPH6mpWLrWFi+S1aUcS GDuI+KGFjgRGsKrYty7wczUGsmG1i7gludd1UOMpSeckBSqrZb9IhIFnOQgouvrBqcRh 5BGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to; bh=t7BzEoe3t4dvowvRlgp7vYBuvDqtZTWVcmny08UXZcA=; b=J9m2lYyL/exV383rYlo0lsPVicND1y+UEx72xCXbY+DvSZIZVO15LyvdkoEFeE9igA mYZjyzh8McSbdiTcdnyFOe4fIwp5rMSzu+muEqbv7BM0yvOU973C9ywbTje2gr/C/NEj oK73MT8eP76lmgkzm209V+IbD9K+N9dZwzO4AdTkq2tR3q134CDypLOLABGQ02PBQZ7J EyzpMzj2kkRpl0zIZmcY3IwxnyjyrO2toI/eYJqoi1M7oFrnsHK+OanZ+Vx4yKXF+lAO 60numyjqmMPUc+ighNaNKUH5DckmvOtAZpS2WwYhbULgE81NQxP9lz280btIQDglp5T/ Ui5g== X-Gm-Message-State: AOAM533c5IluRnHlIp9dv/J0wf6h22hgXCYbLYX+PGzWrTlNRK5WXOSz k8CeJNaLSnAVQMjkZbp9599RIHZzsppo5BQC5oZ5ugdvVmM= X-Google-Smtp-Source: ABdhPJwTQ4UfTApbyBlBd6LSr7FfgsZeKecc5XygY8O/Lox7OG+l1z0miiNBbalV568+9DYDQEAd7suYdDikrBCd3ZM= X-Received: by 2002:a17:906:791:: with SMTP id l17mr6434987ejc.361.1602130701352; Wed, 07 Oct 2020 21:18:21 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Reply-To: emkornfield@gmail.com From: Micah Kornfield Date: Wed, 7 Oct 2020 21:18:10 -0700 Message-ID: Subject: Re: Access Schema from ArrowReader in Java To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="0000000000005682ef05b1211fd4" --0000000000005682ef05b1211fd4 Content-Type: text/plain; charset="UTF-8" I think this is probably because the schema is transformed for Dictionary encoded fields [1]. Something could probably be done to expose the schema separately, but the library and readers are mostly designed around populating and repopulating VectorSchemaRoots, so I don't think the extra cost was considered. What type of bottlenecks is this causing for you? If you would like to open a PR or further discuss dev@ might be a better place to discuss use-case and design of the feature you are looking for. Thanks, Micah [1] https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowReader.java#L183 On Wed, Oct 7, 2020 at 12:34 PM Michael Mior wrote: > Why is the Schema object not exposed in ArrowReader? (e.g. readSchema > is protected). Instead, I need to call > getVectorSchemaRoot().getSchema() which unnecessarily allocates a > VectorSchemaRoot that I don't immediately need. > > -- > Michael Mior > mmior@apache.org > --0000000000005682ef05b1211fd4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I think this is probably because the schema is transformed= for Dictionary encoded=C2=A0fields [1].=C2=A0 Something could probably be = done to expose the schema separately, but the library and readers are mostl= y designed around populating and repopulating VectorSchemaRoots, so I don&#= 39;t think the extra cost was considered.=C2=A0 What type of bottlenecks is= this causing for you?

If you would like to open a PR or= further discuss dev@ might be a better place to discuss use-case and desig= n of the feature you are looking for.

Thanks,
On Wed, Oct 7, 2020 at 12:34 PM Michael Mior <mmior@apache.org> wrote:
Why is the Schema object not exposed i= n ArrowReader? (e.g. readSchema
is protected). Instead, I need to call
getVectorSchemaRoot().getSchema() which unnecessarily allocates a
VectorSchemaRoot that I don't immediately need.

--
Michael Mior
mmior@apache.org<= br>
--0000000000005682ef05b1211fd4--