Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3724A200C82 for ; Sat, 13 May 2017 00:16:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 35BFD160BC8; Fri, 12 May 2017 22:16:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7D2F5160BB8 for ; Sat, 13 May 2017 00:16:45 +0200 (CEST) Received: (qmail 22173 invoked by uid 500); 12 May 2017 22:16:44 -0000 Mailing-List: contact dev-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@orc.apache.org Delivered-To: mailing list dev@orc.apache.org Received: (qmail 22162 invoked by uid 99); 12 May 2017 22:16:44 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 May 2017 22:16:44 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 5B581DFD43; Fri, 12 May 2017 22:16:44 +0000 (UTC) From: wgtmac To: dev@orc.apache.org Reply-To: dev@orc.apache.org References: In-Reply-To: Subject: [GitHub] orc issue #120: ORC-184: Refactor ColumnStatistics classes for writer Content-Type: text/plain Message-Id: <20170512221644.5B581DFD43@git1-us-west.apache.org> Date: Fri, 12 May 2017 22:16:44 +0000 (UTC) archived-at: Fri, 12 May 2017 22:16:46 -0000 Github user wgtmac commented on the issue: https://github.com/apache/orc/pull/120 Yes, our current ColumnWriter design follows the approach of ColumnReader. One base class ColumnWriter takes care of all common work, and specific inherited ColumnWriters like IntegerColumnWriter, StructColumnWriter, BooleanColumnWriters, etc. manage their own logic. To integrate ColumnStatistics, we have two options (correct me if I'm wrong): 1. Put a based ColumnStatistics class in the base ColumnWriter like I said previously, and in each TypeColumnWriter, use dynamic type of ColumnStatistics to do its type-specific logic. In this way, we can save many duplicate code by polymorphism in the base ColumnWriter class; (our current code is written in this fashion) 2. There is no ColumnStatistic variable in the base class ColumnWriter. Instead, each inherited TypeColumnWriter class uses one TypeColumnStatisticsImpl (not TypeColumnStatistics), e.g. IntegerColumnWriter uses IntegerColumnStatisticsImpl. This is feasible except for a little bit more duplicate stuff compared with the 1st approach. Your InternalColumnStatisticsImpl design is more inclined to the 2nd approach as it is still not a base class to use in the base class ColumnWriter. We can discuss ColumnWriter later as it is difficult without actual code. To save both of our time and effort and avoid significant change of our current ColumnWriter implementation, I will update this PR accordingly based on the approach 2 after your PR is merged. @majetideepak --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---