Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 29385200C62 for ; Wed, 26 Apr 2017 18:38:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 27CC1160BA8; Wed, 26 Apr 2017 16:38:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6E81D160B8F for ; Wed, 26 Apr 2017 18:38:21 +0200 (CEST) Received: (qmail 80930 invoked by uid 500); 26 Apr 2017 16:38:20 -0000 Mailing-List: contact dev-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@orc.apache.org Delivered-To: mailing list dev@orc.apache.org Received: (qmail 80919 invoked by uid 99); 26 Apr 2017 16:38:20 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Apr 2017 16:38:20 +0000 Received: from mail-oi0-f43.google.com (mail-oi0-f43.google.com [209.85.218.43]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 5F9971A0280 for ; Wed, 26 Apr 2017 16:38:20 +0000 (UTC) Received: by mail-oi0-f43.google.com with SMTP id x184so7589168oia.1 for ; Wed, 26 Apr 2017 09:38:20 -0700 (PDT) X-Gm-Message-State: AN3rC/4n8oliMQC8pEi2XzEbRvGjR2bPIhrZ6ni/DinVPrzpae1aq2KY BD8u0c3kusH764JBejrHbFWnX/ERaA== X-Received: by 10.157.32.132 with SMTP id x4mr463501ota.4.1493224699701; Wed, 26 Apr 2017 09:38:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.38.71 with HTTP; Wed, 26 Apr 2017 09:38:19 -0700 (PDT) In-Reply-To: References: <89b9833d-0c1a-45be-a3ff-4f755b3b1d3f.gang.w@alibaba-inc.com> From: "Owen O'Malley" Date: Wed, 26 Apr 2017 09:38:19 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: ORC contribution from Alibaba To: dev@orc.apache.org Content-Type: multipart/alternative; boundary=94eb2c04db7acbb3b0054e1477a1 archived-at: Wed, 26 Apr 2017 16:38:22 -0000 --94eb2c04db7acbb3b0054e1477a1 Content-Type: text/plain; charset=UTF-8 Gang and Xiening, This is exciting stuff and I'm looking forward to working with you. If you can separate out the bug fixes from the refactoring, that would make things much easier. (In particular, we should figure out which of them we should back port to previous versions.) Thanks, Owen On Wed, Apr 26, 2017 at 8:08 AM, Deepak Majeti wrote: > Hi Gang and Xiening, > > We at Vertica have been actively contributing and using the ORC C++ project > as well. > C++ writer will be a great addition to this project and we will look > forward to working with you in merging your contributions. > Thanks. > > > On Wed, Apr 26, 2017 at 2:13 AM, Gang Wu wrote: > > > Hi, > > This is Gang from Alibaba working on Alibaba's big data platform - > > MaxCompute. We have developed our own columnar storage format within > > MaxCompute to support MapReduce and other batch processing workload. But > as > > Apache Orc is getting popular in the industry, we are actively looking at > > integrating Orc format into MaxCompute. > > In the past few months, Xiening (cc'ed) and I have been working on > > echancing Orc C++ to provide full featured C++ reader and writer. Our > work > > mainly involves adding a C++ writer that supports all data types and > stats, > > and supporting index for both reader and writer. As of today, we have > > finished development and testing and plan to contribute this work back to > > the Apach Orc project. We have communicated with Owen via email and have > > created an umbrella JIRA ORC-179 for the plan. In brief, we plan to do > the > > following: > > 1. Refactor common classes for writer and reader > > -- extract common classes and functions for writer and reader to > share > > 2. OutputStream interface for writer > > -- implement several output streams for writing to memory, file, etc. > > -- implement ByteRleEncoder, RleEncoder, BooleanRleEncoder, etc. > > -- support zlib compression > > 3. ORC Writer > > -- write orc file header, file footer, postscript, etc. > > -- write columns of all types > > -- write column statistics > > -- write index stream in writer and reader seeks to > > row based on index information > > 4. other > > -- some minor bug fixes of current code base. > > > > Should you have any question, please feel free to contact us. Any > > feedbacks and suggestions are welcome. Thanks! > > Gang WuSenior EngineerAlibaba Group > > > > > > -- > regards, > Deepak Majeti, > Software Engineer at Vertica > --94eb2c04db7acbb3b0054e1477a1--