Return-Path: X-Original-To: apmail-hawq-dev-archive@minotaur.apache.org Delivered-To: apmail-hawq-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C5FF188CB for ; Mon, 1 Feb 2016 18:16:52 +0000 (UTC) Received: (qmail 80361 invoked by uid 500); 1 Feb 2016 18:16:45 -0000 Delivered-To: apmail-hawq-dev-archive@hawq.apache.org Received: (qmail 80305 invoked by uid 500); 1 Feb 2016 18:16:45 -0000 Mailing-List: contact dev-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hawq.incubator.apache.org Delivered-To: mailing list dev@hawq.incubator.apache.org Received: (qmail 80294 invoked by uid 99); 1 Feb 2016 18:16:45 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2016 18:16:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 0F2271A00EE for ; Mon, 1 Feb 2016 18:16:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.427 X-Spam-Level: X-Spam-Status: No, score=0.427 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.554, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id wfUH2gSBeaC5 for ; Mon, 1 Feb 2016 18:16:36 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with SMTP id AB30643F58 for ; Mon, 1 Feb 2016 18:16:35 +0000 (UTC) Received: (qmail 79778 invoked by uid 99); 1 Feb 2016 18:16:35 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2016 18:16:35 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id F3C38DFD7D; Mon, 1 Feb 2016 18:16:34 +0000 (UTC) From: hornn To: dev@hawq.incubator.apache.org Reply-To: dev@hawq.incubator.apache.org References: In-Reply-To: Subject: [GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ... Content-Type: text/plain Message-Id: <20160201181634.F3C38DFD7D@git1-us-west.apache.org> Date: Mon, 1 Feb 2016 18:16:34 +0000 (UTC) Github user hornn commented on the pull request: https://github.com/apache/incubator-hawq/pull/302#issuecomment-178107477 It sounds very similar to CSV with quoted data, which is not splittable. The way we do it today is by ensuring we process each file by a single accessor, even if it actually consists of multiple splits. (see HdfsTextMulti profile and [QuotedLineBreakAccessor](https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-hdfs/src/main/java/org/apache/hawq/pxf/plugins/hdfs/QuotedLineBreakAccessor.java)). The problem, of course, is that we lose parallelism and performance. @tzolov, if it requires too much re-writing I agree that we can make it in two stages - first the splittable case (one record per line), and then the more complex cases. @adamjshook, good to hear from you : --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---