Return-Path: X-Original-To: apmail-asterixdb-dev-archive@minotaur.apache.org Delivered-To: apmail-asterixdb-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 954A418919 for ; Wed, 20 Jan 2016 20:33:14 +0000 (UTC) Received: (qmail 90530 invoked by uid 500); 20 Jan 2016 20:33:14 -0000 Delivered-To: apmail-asterixdb-dev-archive@asterixdb.apache.org Received: (qmail 90480 invoked by uid 500); 20 Jan 2016 20:33:14 -0000 Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.incubator.apache.org Delivered-To: mailing list dev@asterixdb.incubator.apache.org Received: (qmail 90455 invoked by uid 99); 20 Jan 2016 20:33:14 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2016 20:33:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id AF359C0179 for ; Wed, 20 Jan 2016 20:33:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id f-8egdbzA5kq for ; Wed, 20 Jan 2016 20:33:02 +0000 (UTC) Received: from mail-lb0-f180.google.com (mail-lb0-f180.google.com [209.85.217.180]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 6E71620D0D for ; Wed, 20 Jan 2016 20:33:02 +0000 (UTC) Received: by mail-lb0-f180.google.com with SMTP id oh2so11898956lbb.3 for ; Wed, 20 Jan 2016 12:33:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ntgL4bIvQgzhW4p68SHnW28RhQF53tk6pD64pzvTVmg=; b=iTmOgQiSncx/yPqdn+vlbu9WScIAotdFMpOrWv80aRcOziX2hSynGqpvSRvUagEVr7 kbSzcnPXo4YvcKHV4kpihpKnNe11d/LPncpTaCL4M4roE2Z7s9bzGR2DG2IY40Jwb0WK f8MhG8FyYTKTVi7BAwxOKe8dgG+A4QrtF1hF2b3kBWYY/KVUwBuEewaThZhhECjTsKYa X+fJepuVJxPeENpHiG0/OA5G7ShTHxXQbIdt60figt7dneoNaH9cOiuYf180EutrGIdL MuFtVTpdLvwho79jCvve1t2bCAOiJwuN8xLuKRQXhckIZm2889PNLrcWhZyYEwghPgCZ Ctrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=ntgL4bIvQgzhW4p68SHnW28RhQF53tk6pD64pzvTVmg=; b=k+kY3V9wTv6rTpZbThJSTrNRChegyu+sfbFJxlkUqhtsswGj9o01uQ9YBO0k8e2zLQ DgYcBuUVNw7zcbKB6uyNXrOyymJotyU9uiyeRVSIIqD5fpwtBDI6qhGsgFagEa5TaXZU EhDx1x5rQO4nE3EPf7xnMEB+b/K9iFl5mIk90p+e9DpUdMczVXfNpFGwW9xm2ZZPnYSV Qu/SsVvVo5XxxGJ+h0egefBnb2PZ2uYWFgXHuMYUhV6OutcRRHiWTEcfzR5T2VrK6czJ pPJXcPSl/tQtQf0xlV9ic3rWCh3pMyD8JnkcZrNu9SSxJdyicNMdepR31kS4G2cl3j/P eEDA== X-Gm-Message-State: ALoCoQnjGt5aP1uR1aYj783CKSDq2QsKugFLtduy8YRRh8QLWF06Lmq3I3yE/aCd+LgZXbi/0cYzK4sOUrnxcl5CgL0ZeW89kA== MIME-Version: 1.0 X-Received: by 10.112.235.71 with SMTP id uk7mr13760725lbc.39.1453321980897; Wed, 20 Jan 2016 12:33:00 -0800 (PST) Received: by 10.112.130.195 with HTTP; Wed, 20 Jan 2016 12:33:00 -0800 (PST) In-Reply-To: References: <5B7DC174-B088-4EFB-8B5F-2138CE889E3B@apache.org> <1A4DF5F8-4E4E-4D09-8C56-0BFE423F9B47@apache.org> Date: Wed, 20 Jan 2016 15:33:00 -0500 Message-ID: Subject: Re: Asterix Schema Provider Framework From: Wail Alkowaileet To: dev@asterixdb.incubator.apache.org Content-Type: multipart/alternative; boundary=001a11c3c5c26a76a10529c9e478 --001a11c3c5c26a76a10529c9e478 Content-Type: text/plain; charset=UTF-8 Hi Chen, On Asterix, I think the type UNION is an Algebricks type only. The user cannot define type UNION on their schema (except for optional which is the equivalent of UNION(null,OTHER_TYPE).) To represent "name" on Asterix, the user should leave it as open. In the storage level, "name" will be a "tagged" field and Algebricks type-computer will infer it as type ANY. What I'm suggesting is that if the types get inferred on ingestion, "name" can have the type UNION(record,list[record]) instead of ANY. The question probably is how is that useful? well .. we can help Asterix to fail at compile time instead of failing at run-time. For example, let's assume the following: "x":[1, 2, 3, 4] "x" ["hello", "world"] The inferred type of "x" will be UNION(list[int32],list[string]) which implies that we can apply the function count() without a problem. However, for "name" in the previous example, count() will throw an exception. Also, I believe knowing the schema will reduce the "code size" to handle corner cases of the open type. For example, a bug I forgot to file: use dataverse wosDataverse let $c := (for $x in dataset wos let $id := $x.id group by $country := $x.country with $id return {"country":$country, "id" : $id}) return count($c.id) Throws: Unsupported type UNION(NULL, [ null: open { id: [ ANY ] } ]) for field access expression: function-call: asterix:field-access-by-name, Args:[%0->$4, AString: {id}] [AlgebricksException] I know it's a stupid query :-)) .. but I had a case which enforced me to do that way. As for Spark, String is their "open" type. But, with the limitation that you cannot apply any operation such as count because it's just a string :-)). Thank you Chen for your feedback and engagement. P.S @Till: I'm removing the modifications on Algebricks to have the schema framework on Asterix. On Mon, Jan 18, 2016 at 1:47 PM, Chen Li wrote: > Wail, Thanks for the detailed examples and documentation. They are > very helpful. Just curious: for the provided example, we infer this > "name" type as UNION of both record and a list of records. Is it just > a heuristic? Is there any "principle" behind this approach, compared > the approach by Spark's approach of inferring it as a String? > > Chen > -- *Regards,* Wail Alkowaileet --001a11c3c5c26a76a10529c9e478--