From dev-return-1748-archive-asf-public=cust-asf.ponee.io@orc.apache.org Wed Jan 10 20:14:50 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 7A25C18076D for ; Wed, 10 Jan 2018 20:14:50 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 69675160C1E; Wed, 10 Jan 2018 19:14:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B61D6160C2E for ; Wed, 10 Jan 2018 20:14:49 +0100 (CET) Received: (qmail 62840 invoked by uid 500); 10 Jan 2018 19:14:46 -0000 Mailing-List: contact dev-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@orc.apache.org Delivered-To: mailing list dev@orc.apache.org Received: (qmail 62461 invoked by uid 99); 10 Jan 2018 19:14:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jan 2018 19:14:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 596E51A055B; Wed, 10 Jan 2018 19:14:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id RNUKAUzB5trU; Wed, 10 Jan 2018 19:14:44 +0000 (UTC) Received: from mail-io0-f196.google.com (mail-io0-f196.google.com [209.85.223.196]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5D67F5F39D; Wed, 10 Jan 2018 19:14:44 +0000 (UTC) Received: by mail-io0-f196.google.com with SMTP id k18so569035ioc.11; Wed, 10 Jan 2018 11:14:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=ykYPQVJ+Zim+QrNpMcmspx/ua1cSSv+/kRzHrkTJr7k=; b=h/RvXs2jZLWfFg6MHGtWuiC8PoCwFVlAKzPbUKiiXfhzRiU1OpOdU8l1TtwlCavL4n UdSWacHfXfjcCBaZaR4bxAMKHgKS5k3DBa841RBi3aRkpC77FJJWXAdVO5MTrqg9NwPi h+cMM52GQ9t8ZRAYhXFcxEiF/WwFWJLic+gz1KeSFc7jeoLYrRmGEEuKqEYLGmXsZB0u 3Snm2wTpxjY989W+a6uw7Ky9O/bRmyhPLTuC5L1qzBK8MHQE3MrqOKJuQ+LXNiWq5Mij ZDw0xp6XsupS4ODWnaOfi77ohS+bX8V05UxqYPZymsHxnNrujpJHwLtYB0R7uuCDpRgC LOMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=ykYPQVJ+Zim+QrNpMcmspx/ua1cSSv+/kRzHrkTJr7k=; b=l1vD/1HfX5Uck/XJCfMH5kPW6qAW0PsG4vx91hpbJwkHCjk1EJUbmrFE5DhNKDPX2v j5tyOTTKVyv8zZrvY0lzh6L2tTMFSmkr0nnXfC/mMQUe9sO86k1Uj7BzaOFXNxL1/+Vv +qAanoMChYhP9T3EX5rC+7amDIE/HJqaIOUPNH9PNsbe4d1RoA6n/IfQZPBja623HU1F jT9E+edOw7/Z/rCN2KGhQGHlHYZGsY9Qneh2poA712HPmXtgSVtfOzqjalv0UO3EK/D5 9ik1OqBEgH5CVzlYbRLsO8zZTozgHSMtlOxSV5HaSl4JIQIXhNp95Xmevc9S3LmkBzTm B7sQ== X-Gm-Message-State: AKGB3mJLgEpaC6anoyF/A03Hu5aPB951UTzfrvCs5hrT+/WU2xZ/EiND qr6bkHDMQot8lM/k6+zl9GTrp39kobCF+pwwtvuACA== X-Google-Smtp-Source: ACJfBoub5i/ULXzKhqihPGQu8EobRXdwXzwr/Kl8d8Wk29L1qA3u1CZ+PfY56jmx2LpTpE0C6yQl9uNFPiwJJ3HTR1o= X-Received: by 10.107.155.3 with SMTP id d3mr19577097ioe.247.1515611677923; Wed, 10 Jan 2018 11:14:37 -0800 (PST) MIME-Version: 1.0 Received: by 10.79.129.1 with HTTP; Wed, 10 Jan 2018 11:14:37 -0800 (PST) From: Dongjoon Hyun Date: Wed, 10 Jan 2018 11:14:37 -0800 Message-ID: Subject: Vectorized ORC Reader in Apache Spark 2.3 with Apache ORC 1.4.1. To: dev , User , dev@orc.apache.org, user@orc.apache.org Content-Type: multipart/alternative; boundary="001a1140b922ae1df3056270d784" --001a1140b922ae1df3056270d784 Content-Type: text/plain; charset="UTF-8" Hi, All. Vectorized ORC Reader is now supported in Apache Spark 2.3. https://issues.apache.org/jira/browse/SPARK-16060 It has been a long journey. From now, Spark can read ORC files faster without feature penalty. Thank you for all your support, especially Wenchen Fan. It's done by two commits. [SPARK-16060][SQL] Support Vectorized ORC Reader https://github.com/apache/spark/commit/f44ba910f58083458e1133502e193a 9d6f2bf766 [SPARK-16060][SQL][FOLLOW-UP] add a wrapper solution for vectorized orc reader https://github.com/apache/spark/commit/eaac60a1e20e29084b7151ffca964c faa5ba99d1 Please check OrcReadBenchmark for the final speed-up from `Hive built-in ORC` to `Native ORC Vectorized`. https://github.com/apache/spark/blob/master/sql/hive/ src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala Thank you. Bests, Dongjoon. --001a1140b922ae1df3056270d784--