Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AE2FA200B86 for ; Sun, 4 Sep 2016 06:52:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A24DF160ACD; Sun, 4 Sep 2016 04:52:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E5514160ABB for ; Sun, 4 Sep 2016 06:52:23 +0200 (CEST) Received: (qmail 89243 invoked by uid 500); 4 Sep 2016 04:52:23 -0000 Mailing-List: contact dev-help@asterixdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.apache.org Delivered-To: mailing list dev@asterixdb.apache.org Received: (qmail 89227 invoked by uid 99); 4 Sep 2016 04:52:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Sep 2016 04:52:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 2FC8FC000A for ; Sun, 4 Sep 2016 04:52:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id MDFR-zoG5jBz for ; Sun, 4 Sep 2016 04:52:20 +0000 (UTC) Received: from mail-pf0-f175.google.com (mail-pf0-f175.google.com [209.85.192.175]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 611F65F1F6 for ; Sun, 4 Sep 2016 04:52:20 +0000 (UTC) Received: by mail-pf0-f175.google.com with SMTP id h186so54643449pfg.3 for ; Sat, 03 Sep 2016 21:52:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to; bh=zkOH4yEqdlDX59AbdVh5dswxIICYGTNFDH9U16hXzb4=; b=eE0dJXXdf/diMzAV2abdCf/YDU7NNIN12cUeaFSIEafiNp8Kan3/+AZU+3/K/dQVve ilH32tJ6oFJL76XaU3P8ynsBB8j1XxHdd2+TVN8ibmaTRkG8fxCLBgkQSLPWFYwRokLf CggX7P+2BWSMTuC9PSP1dm7WYfzRbVkXx/9fLscx0G5tAZGt+sN7t/lHh7gmbr8hEldg ZwR/xigCSJJzK6ZRvKMdJ7ePtJHDLv+YakibouSQIQtb/Z1fZEKdmRoN4X4lIppqhHZL E1rrmCbb+MPVlj6WqRyAhyBOGeVR5JjaTGDYlhqV0iSppwyIdTZlD/BG/yVGBAhVRRaS Tt7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=zkOH4yEqdlDX59AbdVh5dswxIICYGTNFDH9U16hXzb4=; b=D8wgXxoRtprBrGl42rWDEWMnEQc/aZr+qLd1RCmQi+Fem3+2S6SFA0Zz97ZgVZo4PP QrajIWjNG1sBxU5fIdeTn8Sy61QYNFo82J18Waoe65TAP5gRSoRaltQQg27YOEc9Y51L yiAxpW1YlAiT2hm6pNtRIb6a4rgjDf2NJVGN/4aKtXdFvmwuHvaTI9aFgfXNdHtBqoJU hWnHZdlqPovYpd3vBt7LWaJo2x4PB59Ahl+DbabDM4evwGEvfVMRrYl5n7y3UbSMD0jP brcm+kOhI9yI+KK+XvStCpUmbDs3P5SpQd6GOYvGFxI+BlbPN3FqeQHyPjmNP9w6S2Vo HP5w== X-Gm-Message-State: AE9vXwOidWlFOR2TXHq5aAQXMwsF7KHwQSbBT0+YSi7De4I7zw+l24UkzYAg7Uj1VfK1+w== X-Received: by 10.98.210.196 with SMTP id c187mr15063170pfg.112.1472964733343; Sat, 03 Sep 2016 21:52:13 -0700 (PDT) Received: from mikejcarey.local (pgl-110.173.191-170.primenet.in. [110.173.191.170]) by smtp.googlemail.com with ESMTPSA id p10sm17730126pan.4.2016.09.03.21.52.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 03 Sep 2016 21:52:12 -0700 (PDT) Subject: Re: Indexing non-ADM data. To: dev@asterixdb.apache.org References: From: Mike Carey Message-ID: <0d7610b3-18b0-320c-cb84-698c918df51c@gmail.com> Date: Sun, 4 Sep 2016 10:22:09 +0530 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------31C466E0B5DCB53570F33767" archived-at: Sun, 04 Sep 2016 04:52:24 -0000 --------------31C466E0B5DCB53570F33767 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Wail, Great inputs/requirements! We should definitely think about how to address these. One thing that could help with the second item would be "functional indexes" - supporting indexing on an expression rather than just base data - some systems (e.g., PostgreSQL) support that - not rocket science - and that could make data that's convertible to spatial data via a function call indexable spatially. As for the first point - I'm not sure I "get it" - are external indexes not good enough? Oh - wait - is the issue that we should offer per-object transformations during load? (E.g., the ability to put a UDF on the load pipeline, like we do on the feed pipeline?) Thx! Mike On 9/2/16 12:50 PM, Wail Alkowaileet wrote: > Hi Dev, > > In the last year or so I have been more involved in AsterixDB. However, I'm > 90% user and 10% developer (due to the nature of my work). I want to share > some of my (and my colleagues) experience with ADM. However, I might be too > obvious. > > One of the challenges we face most of the time is Indexing non-ADM data. > Most of the data are either in JSON or CSV format which mean all ADM > richness are not usable. > > For instance in load, I usually create External (or Temporary) Dataset, > query/transform and then insert it to my Internal Dataset, which takes more > time compared with load, as a result of flush/merge operations. > > Another challenging case, The TwitterFeed example > , the > *longitude* and *latitude* fields are not indexable and I need to ETL to > another dataset to transform (lon,lat) to a point type*.* > > It would be awesome if we can bridge non-ADM to ADM types. > > --------------31C466E0B5DCB53570F33767--