From user-return-327-archive-asf-public=cust-asf.ponee.io@orc.apache.org Fri Jan 22 18:32:28 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id D158A180638 for ; Fri, 22 Jan 2021 19:32:27 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id 193EC43AA0 for ; Fri, 22 Jan 2021 18:32:27 +0000 (UTC) Received: (qmail 93048 invoked by uid 500); 22 Jan 2021 18:32:26 -0000 Mailing-List: contact user-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@orc.apache.org Delivered-To: mailing list user@orc.apache.org Received: (qmail 93021 invoked by uid 99); 22 Jan 2021 18:32:26 -0000 Received: from spamproc1-he-de.apache.org (HELO spamproc1-he-de.apache.org) (116.203.196.100) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Jan 2021 18:32:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-de.apache.org (ASF Mail Server at spamproc1-he-de.apache.org) with ESMTP id A41EE1FF39B for ; Fri, 22 Jan 2021 18:32:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-de.apache.org X-Spam-Flag: NO X-Spam-Score: 3.201 X-Spam-Level: *** X-Spam-Status: No, score=3.201 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENA_SUBJ_ODD_CASE=3.2, HTML_MESSAGE=0.2, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamproc1-he-de.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=arista.com Received: from mx1-ec2-va.apache.org ([116.203.227.195]) by localhost (spamproc1-he-de.apache.org [116.203.196.100]) (amavisd-new, port 10024) with ESMTP id IVGhfUH3hUrV for ; Fri, 22 Jan 2021 18:32:24 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.210.42; helo=mail-ot1-f42.google.com; envelope-from=andrey.elenskiy@arista.com; receiver= Received: from mail-ot1-f42.google.com (mail-ot1-f42.google.com [209.85.210.42]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 54353BC957 for ; Fri, 22 Jan 2021 18:32:24 +0000 (UTC) Received: by mail-ot1-f42.google.com with SMTP id v21so6028038otj.3 for ; Fri, 22 Jan 2021 10:32:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arista.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Gh0wCQyNDS4aVvuY32dAhbmTxtiuDeuOx1W7fpFO4J8=; b=QRCQY7JMTSxZWoAFSK9G9+bPesZGV6g+A+DxlD6gNKVshcgxFZAUUrEzXYxB8i0OUQ 1VNmkpArN4Q5/4+9zZTWuPfPkCsqIEuv31t+u12Y/Q4fyfNJk9JCJcKr/F26k4QRUNQ9 FzDkODb95onnTFR/HCr2VTQRVPD+N9eqLYjGCXnex7VC2qMDWqYSgfrfyr2yQI37dGwj 3cRtou5eL4LUsiBGhy0wfmmTzpvhc2vw1CgXhnGJcaaH/6pSEw9AA0ojMlJHKQsHXQYx Xms6/hTtByCroRkQG5gwm0H0PYaOzHLFc6VRKY/h4MBKn5MqLYCzvGeBioXiZd9/R4v5 TQsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Gh0wCQyNDS4aVvuY32dAhbmTxtiuDeuOx1W7fpFO4J8=; b=jEVBCDwEs51KSowWVzsZR5YWSCvUt9Bc4kdjJoL1bKriJv0aEA6EgZ8mhPPrYVdSGp +psxjW4w4EF5qFu7ayc5+iMmKpYdi3jM3k5OeucLHJjwySbiTSTwlWpEQhxaMKdAo9oI ySPhLd5r0SO40TcF+goQD3m/1MM9j7IIPG277sgl/hO64fpHzVt/Yb7CkbvFqFXgmrKt nNOEc8S0g0cT4Qo3u/iyqCQ4rEMvzdU5M9uiwDzHk21KFHwRKSUy17hMGDQrKrfrRUwH kTVZLZQyy+xg1aO7GnceVZdNiBInGUmnVhZNd8eHrLXEy0yDvLkKNp9j9xIqnr4mbrQm 3TVw== X-Gm-Message-State: AOAM531FKkhm4xzCdk9M4/IZKzuM8hZ29UfacsfUngq+i488Eq0wVBer mW/xu5qRku7HbYz6oU9snGGZ4H/zKzG7QgsdaL7Z7qFBJ+y0xA== X-Google-Smtp-Source: ABdhPJxF33bwYTT/7oOaAMAYPg7R1loaLt+DzkoAIHr/XPNOGKiglNQNA+brQ4jdK3nZUZl34oK4Y2n0frvjYfMmB4s= X-Received: by 2002:a05:6830:16d1:: with SMTP id l17mr4386993otr.81.1611340338033; Fri, 22 Jan 2021 10:32:18 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Andrey Elenskiy Date: Fri, 22 Jan 2021 10:32:06 -0800 Message-ID: Subject: Re: [Java] PhysicalWriter to DataOutputStream implementation? To: user@orc.apache.org Content-Type: multipart/alternative; boundary="0000000000007623f405b98168b0" --0000000000007623f405b98168b0 Content-Type: text/plain; charset="UTF-8" Thanks to both of you, I've actually went ahead with implementing FileSystemAPI following this util: https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/util/StreamWrapperFileSystem.java I think it would be awesome to have ORC separated from hadoop class eventually as I have to pull those jars as dependency and of course there are multiple layers of indirection here. On Fri, Jan 22, 2021 at 10:21 AM Owen O'Malley wrote: > Ok, a couple of things: > > - The PhysicalWriter was intended so that LLAP could implement a write > through cache where the new file was put into the cache as well as written > to long term storage. > - The Hadoop FileSystem API, which is what ORC currently uses, is > extensible and has a lot of bindings other than HDFS. For your use case, > you probably want to use "file:///my-dir/my.orc" > - Somewhere in the unit tests there is an implementation of Hadoop > FileSystem that uses ByteBuffers in memory. > - Finally, over the years there has been an ask for using ORC core > without having Hadoop on the class path. Let me take a pass at that today > to see if I can make that work. See > https://issues.apache.org/jira/browse/ORC-508 . > > .. Owen > > On Tue, Jan 19, 2021 at 7:20 PM Andrey Elenskiy < > andrey.elenskiy@arista.com> wrote: > >> Hello, currently there's only a single implementation of PhysicalWriter >> that I were able to find -- PhysicalFSWriter, which only gives the option >> to write to HDFS. >> >> I'd like to reuse the ORC file format for my own purposes without the >> destination being HDFS, but just some byte buffer where I can decide myself >> where the bytes end up being saved. >> >> I've started implementing PhysicalWriter, but it seems like a lot of it >> just ends up being copied over from PhysicalFSWriter which seems redundant. >> So, I'm wondering if maybe something already exists to achieve my goal of >> just writing resulting columns to DataOutputStream (maybe there's some >> unofficial Java library or I'm missing some obvious official API). >> >> Thanks, >> Andrey >> > --0000000000007623f405b98168b0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks to both of you, I've actually went ahead w= ith implementing FileSystemAPI following this util: https://github.com/apache/orc/blob/master/java/core= /src/java/org/apache/orc/util/StreamWrapperFileSystem.java
I = think it would be awesome to have ORC separated from hadoop class eventuall= y as I have to pull those jars as dependency and of course there are multip= le layers of indirection here.

On Fri, Jan 22, 2021 at 10:21 AM Ow= en O'Malley <owen.omalley@= gmail.com> wrote:
Ok, a couple of things:
  • = The PhysicalWriter was intended so that LLAP could implement a write throug= h cache where the new file was put into the cache as well as written to lon= g term storage.
  • The Hadoop FileSystem API, which is what ORC curren= tly uses, is extensible and has a lot of bindings other than HDFS. For your= use case, you probably want to use "file:///my-dir/my.orc"
  • <= li>Somewhere in the unit tests there is an implementation of Hadoop FileSys= tem that uses ByteBuffers in memory.
  • Finally, over the years there = has been an ask for using ORC core without having Hadoop on the class path.= Let me take a pass at that today to see if I can make that work. See http= s://issues.apache.org/jira/browse/ORC-508 .
.. Owen

On Tue, Jan 19, 2021 at 7:20 PM Andrey Elenskiy <andrey.elenskiy@ar= ista.com> wrote:
Hello, currently there's only a single im= plementation of=20 PhysicalWriter that I were able to find -- PhysicalFSWriter, which only=20 gives the option to write to HDFS.

I'd like to= =20 reuse the ORC file format for my own purposes without the destination=20 being HDFS, but just some byte buffer where I can decide myself where=20 the bytes end up being saved.

I've started=20 implementing PhysicalWriter, but it seems like a lot of it just ends up=20 being copied over from PhysicalFSWriter which seems redundant. So, I'm= =20 wondering if maybe something already exists to achieve my goal of just=20 writing resulting columns to DataOutputStream (maybe there's some=20 unofficial Java library or I'm missing some obvious official API).

Thanks,
Andrey
--0000000000007623f405b98168b0--