From dev-return-2771-archive-asf-public=cust-asf.ponee.io@orc.apache.org Tue Mar 19 20:31:07 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 729C318077A for ; Tue, 19 Mar 2019 21:31:06 +0100 (CET) Received: (qmail 27052 invoked by uid 500); 19 Mar 2019 20:31:05 -0000 Mailing-List: contact dev-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@orc.apache.org Delivered-To: mailing list dev@orc.apache.org Received: (qmail 26960 invoked by uid 99); 19 Mar 2019 20:31:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Mar 2019 20:31:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 599EC180D59 for ; Tue, 19 Mar 2019 20:31:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.201 X-Spam-Level: X-Spam-Status: No, score=-0.201 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=iq80.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id KEIJRh928u4e for ; Tue, 19 Mar 2019 20:31:02 +0000 (UTC) Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id B9FA160E40 for ; Tue, 19 Mar 2019 20:31:02 +0000 (UTC) Received: by mail-pg1-f173.google.com with SMTP id r124so9430pgr.3 for ; Tue, 19 Mar 2019 13:31:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iq80.com; s=google; h=from:content-transfer-encoding:mime-version:subject:message-id:date :to; bh=hTN9s59aumd1RiaeMOiYtawNSThtHNnXy8KDtRanOmM=; b=WzyHCiHN5roJUtWOpL1/F5PJsl6BUTr6O81J4CKsnoJD/vqDtjcpSxum61SIBoqIT8 7zj4DNsZEVotCMf6db81sgdY2T0D5Je+ME9wDeGFdtMiJoWOePC11Z9DOERNXgoJbqXb XL9sSuTM4hdq7gJ9xRAf6Ua+d/cnXcDm1RZdNQV8hcrcyGzdYZxFlp5RMEI2+hgFex/0 lS/WlDaFR3rQtpNhIbOIDW1f+ksLlr/DilT4jLtBGuxE0CBvWFYMPyFleL60GGuXbTK9 lpVO1MmPrr802d4bI/H7HUskkuMD5wBdvz5tPdhA/8H0IiWbcKxlsJJwO/sEroieKcO3 //xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:message-id:date:to; bh=hTN9s59aumd1RiaeMOiYtawNSThtHNnXy8KDtRanOmM=; b=Ou1C0FMtPJPwap7FaN4fIRRuyaDof9BOQbyOr7fjC87tOWSQu71MrtrELaPmJl27yn PZ15NYXrj1Z6gHVHabS+w0XM45yJy4akBpm6jKiMekoDjfCjt8RvW1CxDi5AR64xeiRB q517gvp0CATnR/PaIGEuQHHAln9GMx4q4BZhY4f/KPRRuKA4tNbFKrxsyqt+lha/VGBd dGX4JHRsq3dWbZ66MwKA7OncwCPHx1gDZWEfcjLX+/ZUtAPEWx/gb+RgT+AhznIYTEA7 fqt7pkbzaDUjQ16rZ5OWrtueO3haWAECyGZAtb5rxzohR2hUdgKrYsj01LifOAc/Cur/ gUtQ== X-Gm-Message-State: APjAAAXX1jYFIeGk73ePoFEEPj06K1Idye9ClwEGQNtgRVeZgx5g6aQf Kf1DwAlBA4hgCMOak9D7PvkNbX82BUz5OQ== X-Google-Smtp-Source: APXvYqwmcd39ZxT5tor25FKrmt/fAitnnXdcvoc67VSbpAyTPWYCSdmmHUc9YiKsaZWylg3q45oFoA== X-Received: by 2002:a17:902:1347:: with SMTP id r7mr3980033ple.82.1553027455829; Tue, 19 Mar 2019 13:30:55 -0700 (PDT) Received: from ?IPv6:2601:647:5800:de:24a0:f12a:6974:2740? ([2601:647:5800:de:24a0:f12a:6974:2740]) by smtp.gmail.com with ESMTPSA id 20sm28312563pfs.182.2019.03.19.13.30.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 19 Mar 2019 13:30:55 -0700 (PDT) From: Dain Sundstrom Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Type length, scale, and precision? Message-Id: <8C07A253-3BF7-4118-8EC3-34E5B26311F7@iq80.com> Date: Tue, 19 Mar 2019 13:30:54 -0700 To: dev@orc.apache.org X-Mailer: Apple Mail (2.3445.102.3) For the types in the ORC footer, we have the following: // the maximum length of the type for varchar or char in UTF-8 = characters optional uint32 maximumLength =3D 4; // the precision and scale for decimal optional uint32 precision =3D 5; optional uint32 scale =3D 6; If the maximumLength, is set to N, can I be confident that no value for = that column in the file will contain more than N UTF-8 characters? Is = this still true for concatenated ORC files. I have a similar question about DECIMAL. Decimal encoding currently = uses the SECONDARY stream to encode the "scale". Is this scale = guaranteed to be the same scale as the type scale in the footer? Thanks, -dain ---- Dain Sundstrom Co-founder @ Presto Software Foundation, Co-creator of Presto = (https://prestosql.io)