asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <>
Subject Re: Indexing non-ADM data.
Date Sun, 04 Sep 2016 04:52:09 GMT

Great inputs/requirements!  We should definitely think about how to 
address these.  One thing that could help with the second item would be 
"functional indexes" - supporting indexing on an expression rather than 
just base data - some systems (e.g., PostgreSQL) support that - not 
rocket science - and that could make data that's convertible to spatial 
data via a function call indexable spatially.  As for the first point - 
I'm not sure I "get it" - are external indexes not good enough?  Oh - 
wait - is the issue that we should offer per-object transformations 
during load?  (E.g., the ability to put a UDF on the load pipeline, like 
we do on the feed pipeline?)



On 9/2/16 12:50 PM, Wail Alkowaileet wrote:
> Hi Dev,
> In the last year or so I have been more involved in AsterixDB. However, I'm
> 90% user and 10% developer (due to the nature of my work). I want to share
> some of my (and my colleagues) experience with ADM. However, I might be too
> obvious.
> One of the challenges we face most of the time is Indexing non-ADM data.
> Most of the data are either in JSON or CSV format which mean all ADM
> richness are not usable.
> For instance in load, I usually create External (or Temporary) Dataset,
> query/transform and then insert it to my Internal Dataset, which takes more
> time compared with load, as a result of flush/merge operations.
> Another challenging case, The TwitterFeed example
> <>, the
> *longitude* and *latitude* fields are not indexable and I need to ETL to
> another dataset to transform (lon,lat) to a point type*.*
> It would be awesome if we can bridge non-ADM to ADM types.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message