Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 55FEB184C5 for ; Sat, 17 Oct 2015 20:51:17 +0000 (UTC) Received: (qmail 58468 invoked by uid 500); 17 Oct 2015 20:51:17 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 58408 invoked by uid 500); 17 Oct 2015 20:51:17 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 58396 invoked by uid 99); 17 Oct 2015 20:51:17 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Oct 2015 20:51:17 +0000 Received: from mail-wi0-f169.google.com (mail-wi0-f169.google.com [209.85.212.169]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 836321A010F for ; Sat, 17 Oct 2015 20:51:16 +0000 (UTC) Received: by wicfv8 with SMTP id fv8so31442217wic.0 for ; Sat, 17 Oct 2015 13:51:15 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.194.129.129 with SMTP id nw1mr15051273wjb.37.1445115075246; Sat, 17 Oct 2015 13:51:15 -0700 (PDT) Received: by 10.194.61.74 with HTTP; Sat, 17 Oct 2015 13:51:15 -0700 (PDT) In-Reply-To: <53EC4E26-4E2E-4FEC-A6E4-18418D170A52@gmail.com> References: <53EC4E26-4E2E-4FEC-A6E4-18418D170A52@gmail.com> Date: Sat, 17 Oct 2015 13:51:15 -0700 Message-ID: Subject: Re: Apache Drill From: Julian Hyde To: dev@drill.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Yes, frankly, performance is a concern. But there are also many concerns about to fit a deep XML document into Drill's very json-centric model. Building a good XML adapter is a very big task. My hunch is that we should not let the perfect be the enemy of the good. Build a version 1 XML adapter based on an XML-to-JSON converter and it will give us plenty of ideas for what the "perfect" adapter in version 2 should look like. On Sat, Oct 17, 2015 at 1:43 PM, Matt Burgess wrote: > If the converter is clean and performant then I'm sure the community (inc= luding me) is interested :) > > However I wonder if Drill can afford to add a translation layer between d= ata formats, could we be better served with similar parsing in Drill for XM= L as we do for JSON, or can it be pushed down far enough (to the parser) to= not make a noticeable difference (which is what I think Julian is implying= )? > > Sent from my iPhone > >> On Oct 17, 2015, at 1:41 PM, Magnus Pierre wrote: >> >> Hello, >> >> Just wrote a simple sax implementation that converts xml to json and tha= t >> is able to deal with decently complex xml's, that I currently use in Sto= rm. >> Takes attributes, and everything. >> >> I can share it with the community if interesting. >> >> /Magnus >> Den 17 okt 2015 7:02 em skrev "Julian Hyde" : >> >>> Seems to me the biggest problem is to make drill understand the nested >>> structure of an xml document. That work has been done for json, so let'= s >>> build on it. Suppose there was a translator that converted xml to json >>> (adding attributes for things that json lacks, such as namespaces, text= , >>> element tags). Drill knows how to handle json, even if it is a bit verb= ose. >>> The translator could be applied on the fly. >>> >>> Julian >>> >>> >>> >>> Sent from my iPad >>>>> On Oct 16, 2015, at 2:31 PM, Stef=C3=A1n Baxter >>>> wrote: >>>> >>>> Hi, >>>> >>>> It's not possible but there has been some talk here about supporting i= t. >>>> If I remember correctly it's rather complicated and not really feasibl= e. >>>> (I'm just a newbie so don't take my words for it) >>>> >>>> >>>> Regards, >>>> -Stefan >>>> >>>> On Fri, Oct 16, 2015 at 8:54 PM, Daniel Ajo >>> >>>> wrote: >>>> >>>>> Hey there, >>>>> >>>>> I was wondering if it is possible to query XML files using Apache Dri= ll? >>>>> >>>>> I see there are several formats, and maybe it would work using an xpa= th >>>>> query of some sorts, but just wondering if it would work to directly >>> query >>>>> it using some sort of plug-in. >>>>> >>>>> Well, let me know, >>>>> >>>>> Daniel Ajo >>>>> ********************************************************* >>> CONFIDENTIALITY >>>>> NOTE: This electronic transmission contains information belonging to >>> Abarca >>>>> Health LLC, which is confidential or legally privileged. If you are n= ot >>> the >>>>> intended recipient, please immediately advise the sender by reply >>> e-mail or >>>>> telephone that this message has been inadvertently transmitted to you >>> and >>>>> delete this e-mail from your system. If you have received this >>> transmission >>>>> in error, you are hereby notified that any disclosure, copying, >>>>> distribution or the taking of any action in reliance on the contents = of >>> the >>>>> information is strictly prohibited. >>>