From user-return-187-archive-asf-public=cust-asf.ponee.io@orc.apache.org Mon Jan 29 22:56:00 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 68845180654 for ; Mon, 29 Jan 2018 22:56:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5832E160C31; Mon, 29 Jan 2018 21:56:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A044B160C2F for ; Mon, 29 Jan 2018 22:55:59 +0100 (CET) Received: (qmail 43643 invoked by uid 500); 29 Jan 2018 21:55:58 -0000 Mailing-List: contact user-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@orc.apache.org Delivered-To: mailing list user@orc.apache.org Received: (qmail 43633 invoked by uid 99); 29 Jan 2018 21:55:58 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jan 2018 21:55:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C0C7E180414 for ; Mon, 29 Jan 2018 21:55:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ZdQVrsdgjNs4 for ; Mon, 29 Jan 2018 21:55:56 +0000 (UTC) Received: from mail-oi0-f51.google.com (mail-oi0-f51.google.com [209.85.218.51]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 87E6D5F3A1 for ; Mon, 29 Jan 2018 21:55:56 +0000 (UTC) Received: by mail-oi0-f51.google.com with SMTP id l6so67759oii.13 for ; Mon, 29 Jan 2018 13:55:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=REt7DQRS/Z/NaSesCj6EF4/X0NmV2uRKjV9KBZk0vSI=; b=r6/eBRBf9RTdV207bjeJjv2qXgfNA4CvkClSioxiAclfEyqHrvX2TfsHMbc8/96IgT VYR6iQKZgYZtnvJCy5gJ+2aILEaDL1sjPEvd56XOvBzcT2q0g80PQAWWjkc2Q1Ptr7VV UzTF9rcHqpJEGR5IbNWiTKDMlFd1lYANYopa7fI+y8r0uVAgqrvfNR/bAfF7/Ks79Gs2 r6oq5+V/wDxL7Nndli/oMMhNQh0Wowheu05dBIdR/AEB5f/AqU0ab9V7kB9HR72A3LoW k8pujoYr4BV1y9vL083PcQH6+DN5PYcQRBBDU02I1vZsj2u2PMygq9Chl0YURkT3cNJS D8IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=REt7DQRS/Z/NaSesCj6EF4/X0NmV2uRKjV9KBZk0vSI=; b=mO7Wv/tXJnDJfAAkgXA1r9EkE0lnx/A+PSUGkUg0cUrOxq992og5HceOg96z8F2daV HLBaBpw6t0vHtDPhRRED4y0MiXlNNqAkpjGKRKgX4trkYzBbMHsU8WPIehkJGfQw8HSu AThmKNbWEko22dt6p/55WuT7sNMYkOIiBUPyrMunoruVT9kyEfZBO4e3whb3u81k2VSL isJP4XUkr0DZJilMBwjkBOmEpKwdVuUzrI6h86/c7CeLwm43dtrL+CcT4H/1ZJiImCL6 u/eyZnuV0pfm2kFTdjCaMSs/c2Y6ErtEQNpHP2d0EMb12LjKCFvtSJ0XVepVtDZi33jB fWJw== X-Gm-Message-State: AKwxytdwD4/j28kMbY4jV0YOCG4+pzKn7+R5C+N0iJ57mCu5nDSutMSW bVvzMfURaeumZOyUEldjsj77mm9Xmutv5GXbJKcU+A== X-Google-Smtp-Source: AH8x225nUDAEl2ypmqFC3b3OawZXR7I0w4rhAARQDPDQfr6i53Th9FOfAR9Ey2Dv6/Z/vcMr2SqXmTEWPRHu2TW/o90= X-Received: by 10.202.117.204 with SMTP id q195mr3724469oic.191.1517262955590; Mon, 29 Jan 2018 13:55:55 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.44.67 with HTTP; Mon, 29 Jan 2018 13:55:54 -0800 (PST) In-Reply-To: References: From: "Owen O'Malley" Date: Mon, 29 Jan 2018 13:55:54 -0800 Message-ID: Subject: Re: Questions regarding hive --orcfiledump or exporting orcfiles To: user@orc.apache.org Content-Type: multipart/alternative; boundary="001a1134fdb27fba3d0563f14f57" --001a1134fdb27fba3d0563f14f57 Content-Type: text/plain; charset="UTF-8" My guess is that you should be able to save a fair amount of time by doing a byte copy rather than rewriting the ORC file. To get a distributed copy, you'd probably want to use distcp and then create the necessary tables and partitions for your Hive metastore. .. Owen On Mon, Jan 29, 2018 at 1:16 PM, Colin Williams < colin.williams.seattle@gmail.com> wrote: > Hello, > > Wasn't sure if I should ask here or on the Hive mailing list. We're > creating External tables from an S3 bucket that contains some textfile > records. Then we import these tables with STORED AS ORC. > > We have about 20 tables, and it takes a couple hours to create the tables. > However currently we are just using a static data set. > > Then I'm wondering can I reduce the load time by exporting the tables > using hive --orcfiledump or just copying the files from HDFS into an S3 > bucket. And then load into HDFS again? Will this likely save me a bit of > load time? > > > Best, > > Colin Williams > --001a1134fdb27fba3d0563f14f57 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
My guess is that you should be = able to save a fair amount of time by doing a byte copy rather than rewriti= ng the ORC file.

To get a distributed copy, you'd probably want to use distcp= and then create the necessary tables and partitions for your Hive metastor= e.

.. = Owen

<= br>
On Mon, Jan 29, 2018 at 1:16 PM, Colin Willia= ms <colin.williams.seattle@gmail.com> wrote:<= br>
Hello,
Wasn't sure if I should ask here or on the Hive mailing li= st. We're creating External tables from an S3 bucket that contains some= textfile records. Then we import these tables with STORED AS ORC.

We have about 20 tables, and it takes a couple hours = to create the tables. However currently we are just using a static data set= .

Then I'm wondering can I reduce the load tim= e by exporting the tables using hive --orcfiledump or just copying the file= s from HDFS into an S3 bucket. And then load into HDFS again? Will this lik= ely save me a bit of load time?


Bes= t,

Colin Williams

--001a1134fdb27fba3d0563f14f57--