Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C2BEF200D06 for ; Mon, 25 Sep 2017 19:36:56 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C15E61609BB; Mon, 25 Sep 2017 17:36:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DD5B01609B5 for ; Mon, 25 Sep 2017 19:36:55 +0200 (CEST) Received: (qmail 65128 invoked by uid 500); 25 Sep 2017 17:36:55 -0000 Mailing-List: contact user-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@orc.apache.org Delivered-To: mailing list user@orc.apache.org Received: (qmail 65118 invoked by uid 99); 25 Sep 2017 17:36:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Sep 2017 17:36:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 4E9B1182789 for ; Mon, 25 Sep 2017 17:36:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id EGj2w_RtnDEd for ; Mon, 25 Sep 2017 17:36:53 +0000 (UTC) Received: from mail-oi0-f53.google.com (mail-oi0-f53.google.com [209.85.218.53]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 67D3C61023 for ; Mon, 25 Sep 2017 17:36:52 +0000 (UTC) Received: by mail-oi0-f53.google.com with SMTP id a74so8039789oib.3 for ; Mon, 25 Sep 2017 10:36:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=TPN4goFfhC52VjPSsqRYlVlobAtTK2BggDLxALjNcRU=; b=QQmC1n7rIVpk2LZvkJ6nWhdiItN0S/YnWPlZM13jl8q+Z7xaW1hhv/xzSxB1bG0zX3 UAM+1t5F5hALc/vOx7/7U9+FC76tKj2BwzzuByS9KaujzHbziF0ub1LuiO3ZCCrvbli6 mbdeKPOIt3Q/Lt8mpm+YBVnznKCBa4hL6FV0MtPKP+/+vBEG0jvZhCJ/QtNJ4jACHP0V 1QjrePBq1Me0mVYyMu5Y2hgFnhkhL1TQ/w/5+m46xSR9NYmH1OOASDDnkAUcvVAQkgrt wIAOzzpMKMh5T4z45Mx/tTWz+E56oqRqpzhVRIfFN41v/zf8U54iwi8ktZyeQjIPd3Sb C4TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=TPN4goFfhC52VjPSsqRYlVlobAtTK2BggDLxALjNcRU=; b=EEivlbVx85LQjBAf3ESBV1qgMqle5vS3iijEW8x0tmoz2mhixa4F7+4zvqSYhs0kMn XANjZDSm+FFg90JtpQP8j9OqCRBv+CkV0epNq+oPFym8TxPrXyV5ST35eSE+zuHGRKXq Gv/BR412ci5DWjiYnPo5RQBiiyFGFZHUOLsUwq4WdpCjtGuR7K8gRJUVc0avR1v/7N4E ilEQTDR3pAH6aN37mqFfbvI65rfZI8wqgNuAeOvGyOdnDdxrGpI2etWd7mUNw77193DQ ctnBZA5+81QtLIidPawOGHnIdKYvZ4gGzfq8ZRXJW+bGHdXvkzSCv5wTq1UjL3uTidSS fvOw== X-Gm-Message-State: AHPjjUhO3yCwb4Ylf7H/KBf6Rhj8XE882C+eon6l2NZmK/PUxBxgXVFH cBNZeZ9m0m7UWqZ48mwwCeqda+Ud1BJQlqNP5N1qPw== X-Google-Smtp-Source: AOwi7QAORGw6dEU4Yb9oK1atGDT89cBJN2P3QmuoIy1fA0KhRr0C4UO9z7sd8vm8rvLGgELWKyhXa8v0xyPjEcz2mYI= X-Received: by 10.157.15.136 with SMTP id d8mr639112otd.73.1506361010754; Mon, 25 Sep 2017 10:36:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.17.169 with HTTP; Mon, 25 Sep 2017 10:36:50 -0700 (PDT) In-Reply-To: References: From: "Owen O'Malley" Date: Mon, 25 Sep 2017 10:36:50 -0700 Message-ID: Subject: Re: Orc writer - continous memory flush out To: user@orc.apache.org Content-Type: multipart/alternative; boundary="001a113ded68f31ae1055a070075" archived-at: Mon, 25 Sep 2017 17:36:57 -0000 --001a113ded68f31ae1055a070075 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ORC has to buffer the entire stripe in memory, so that can write the data in column order rather than row order. If you have large blobs that you can't buffer, I'd suggest writing them to a side file and storing the offsets and lengths in the ORC file. That way you can write the large blobs without spending all of your memory caching them (on either read or write). .. Owen On Mon, Aug 21, 2017 at 6:44 AM, Ozsvath, Tamas (GE Corporate, consultant) = < tamas.ozsvath@ge.com> wrote: > Dear Apache users, > > We are willing to create orc files with org.apache.orc.Writer. Our test > were okay, till we the orc file creation from a database table which > contained blob-s. We have tried to change the following settings but > neither of them was helpful: > > > > org.apache.orc.OrcFile.WriterOptions: > > bufferSize() > > stripeSize() > > blockSize() > > enforceBufferSize() > > > > Is there a way to continously populate the ORC file(flushing out from > memory continously), instead of flushing out data from memory up on > closing the file writer? What is the best practice to create an orc file > from datasource which contains blobs, and can=E2=80=99t be handled only i= n-memory? > > > > Any information is appreciated! > > > > Thanks, > Tamas > > > --001a113ded68f31ae1055a070075 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
ORC has to buffer the entire stripe in memory, so that can= write the data in column order rather than row order. If you have large bl= obs that you can't buffer, I'd suggest writing them to a side file = and storing the offsets and lengths in the ORC file. That way you can write= the large blobs without spending all of your memory caching them (on eithe= r read or write).

.. Owen

On Mon, Aug 21, 2017 at 6:44 AM, Ozs= vath, Tamas (GE Corporate, consultant) <tamas.ozsvath@ge.com> wrote:

Dear Apache users,

We are willing to create orc files with org.apache.orc.Writer. Our test wer= e okay, till we the orc file creation from a database table which contained= blob-s. We have tried to change the following settings but neither of them= was helpful:

=C2=A0

org.apache.orc.OrcFile.WriterOptions:=

bufferSize()

stripeSize()

blockSize()

enforceBufferSize()

=C2=A0

Is there a way to continously populate the ORC file(= flushing out from memory continously), instead of flushing out data=C2=A0 f= rom memory up on closing the file writer? What is the best practice to crea= te an orc file from datasource which contains blobs, and can=E2=80=99t be handled only in-memory?

=C2=A0

Any information is appreciated!

=C2=A0

Thanks,
Tamas

=C2=A0


--001a113ded68f31ae1055a070075--