Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8F4B71844E for ; Sun, 18 Oct 2015 15:41:30 +0000 (UTC) Received: (qmail 94367 invoked by uid 500); 18 Oct 2015 15:41:30 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 94315 invoked by uid 500); 18 Oct 2015 15:41:30 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 94301 invoked by uid 99); 18 Oct 2015 15:41:30 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Oct 2015 15:41:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 8BDFDC220F for ; Sun, 18 Oct 2015 15:41:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id rBCLH5huF2h2 for ; Sun, 18 Oct 2015 15:41:14 +0000 (UTC) Received: from mail-wi0-f169.google.com (mail-wi0-f169.google.com [209.85.212.169]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 39290212A3 for ; Sun, 18 Oct 2015 15:41:14 +0000 (UTC) Received: by wicfx6 with SMTP id fx6so19476120wic.1 for ; Sun, 18 Oct 2015 08:41:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ceK3GwE81dJhR7qaqCdcx4oPdOlCWhoeuGzqmrpBqZI=; b=kMAEa47GTCDFS3yt5m2M6kEaOZFGAfq18oOd9QytqJrzLUm0i5a0Odv3IGBN9Ej9hu YAh0IvUjzOTJwD84jF+qWX2Sas5gdRO0Gl6GkacqG1PNsxe53VH9w90muF50DBcHb+aT M317KHvkKTigvut2wTia9TPk/EYTUf+0Ij0wlRqb+Gyp3rnGRhgM/7Ql9vIIujyTT1Rz unsCHXtt7ylzRLHZZ/4tYyRDt8AZM4Podpv2vkHsR8540t/G/AcbWcmg2puU+0YK5cRr BvUD3N1brH3bsstjgeJJ2VwVjGCTsLqxV6Pf6m2AybB6SMmplaDpmduydza5haQVeKQF WeMw== MIME-Version: 1.0 X-Received: by 10.180.107.193 with SMTP id he1mr16719675wib.81.1445182872770; Sun, 18 Oct 2015 08:41:12 -0700 (PDT) Received: by 10.27.226.2 with HTTP; Sun, 18 Oct 2015 08:41:12 -0700 (PDT) In-Reply-To: References: <53EC4E26-4E2E-4FEC-A6E4-18418D170A52@gmail.com> Date: Sun, 18 Oct 2015 17:41:12 +0200 Message-ID: Subject: Re: Apache Drill From: =?UTF-8?Q?Kasper_S=C3=B8rensen?= To: dev@drill.apache.org Content-Type: multipart/alternative; boundary=e89a8f234c03c46079052262db6e --e89a8f234c03c46079052262db6e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi there, Sorry for barging in, but maybe this is a place where Drill and MetaModel could benefit from each other? We've considered that before at least ... MetaModel already has support for both DOM and SAX based XML querying. They basically inherit some characteristics from DOM and SAX respectively: - In the DOM variant we can infer a schema and all the user has to do is select a XML file/resource anywhere. - In the SAX variant the user has to specify which paths in the XML document should represent logical "tables" and what paths represent their columns. See [1] for more info. Hope this might be of interest to integrate into Drill? Best regards, Kasper S=C3=B8rensen (from the MetaModel project) [1] http://wiki.apache.org/metamodel/examples/XmlTableMapping 2015-10-18 0:35 GMT+02:00 Magnus Pierre : > Well, very few lines of code imho. And simple. Been able to parse pretty > deep structures with no issues so far. Performance? 10-15 5mb xml's in le= ss > than a second on my laptop but then I run it using Storm with some > parallelism in place. Don't know if it's good or bad. I'll share the code > next time I use computer. You don't need to use it, but it works at least= . > > /M > Den 17 okt 2015 10:43 em skrev "Matt Burgess" : > > > If the converter is clean and performant then I'm sure the community > > (including me) is interested :) > > > > However I wonder if Drill can afford to add a translation layer between > > data formats, could we be better served with similar parsing in Drill f= or > > XML as we do for JSON, or can it be pushed down far enough (to the > parser) > > to not make a noticeable difference (which is what I think Julian is > > implying)? > > > > Sent from my iPhone > > > > > On Oct 17, 2015, at 1:41 PM, Magnus Pierre > wrote: > > > > > > Hello, > > > > > > Just wrote a simple sax implementation that converts xml to json and > that > > > is able to deal with decently complex xml's, that I currently use in > > Storm. > > > Takes attributes, and everything. > > > > > > I can share it with the community if interesting. > > > > > > /Magnus > > > Den 17 okt 2015 7:02 em skrev "Julian Hyde" : > > > > > >> Seems to me the biggest problem is to make drill understand the nest= ed > > >> structure of an xml document. That work has been done for json, so > let's > > >> build on it. Suppose there was a translator that converted xml to js= on > > >> (adding attributes for things that json lacks, such as namespaces, > text, > > >> element tags). Drill knows how to handle json, even if it is a bit > > verbose. > > >> The translator could be applied on the fly. > > >> > > >> Julian > > >> > > >> > > >> > > >> Sent from my iPad > > >>>> On Oct 16, 2015, at 2:31 PM, Stef=C3=A1n Baxter < > stefan@activitystream.com > > > > > >>> wrote: > > >>> > > >>> Hi, > > >>> > > >>> It's not possible but there has been some talk here about supportin= g > > it. > > >>> If I remember correctly it's rather complicated and not really > > feasible. > > >>> (I'm just a newbie so don't take my words for it) > > >>> > > >>> > > >>> Regards, > > >>> -Stefan > > >>> > > >>> On Fri, Oct 16, 2015 at 8:54 PM, Daniel Ajo < > > Daniel.Ajo@abarcahealth.com > > >>> > > >>> wrote: > > >>> > > >>>> Hey there, > > >>>> > > >>>> I was wondering if it is possible to query XML files using Apache > > Drill? > > >>>> > > >>>> I see there are several formats, and maybe it would work using an > > xpath > > >>>> query of some sorts, but just wondering if it would work to direct= ly > > >> query > > >>>> it using some sort of plug-in. > > >>>> > > >>>> Well, let me know, > > >>>> > > >>>> Daniel Ajo > > >>>> ********************************************************* > > >> CONFIDENTIALITY > > >>>> NOTE: This electronic transmission contains information belonging = to > > >> Abarca > > >>>> Health LLC, which is confidential or legally privileged. If you ar= e > > not > > >> the > > >>>> intended recipient, please immediately advise the sender by reply > > >> e-mail or > > >>>> telephone that this message has been inadvertently transmitted to > you > > >> and > > >>>> delete this e-mail from your system. If you have received this > > >> transmission > > >>>> in error, you are hereby notified that any disclosure, copying, > > >>>> distribution or the taking of any action in reliance on the conten= ts > > of > > >> the > > >>>> information is strictly prohibited. > > >> > > > --e89a8f234c03c46079052262db6e--