asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Feeds UDF
Date Wed, 09 Dec 2015 07:45:22 GMT
I'm confused:  Why is "self-join" (or any join) an issue?  I think the 
alledged case (:-)) against self-join is equivalent to a case against 
ever doing any queries against any data set under any circumstances 
where data is being inserted....  I don't think we want to restrict the 
system to only querying read-only datasets...

A feed lets you run a query against the system based on the contents of 
a current incoming record R.  Unless I am missing something (which is 
not unlikely because it's been a long day and I just got home from 
traveling :-)), this is equivalent to:

     let $r = ... (picture a constant constructor that yields the same 
content as R) ...
     return feed_processor ($r)

Right?  I.e., the new record R is not yet in the dataset - so - what's 
the issue?  What's special about this?

Cheers,
Mike

PS - Again, apologies if a long day has led to extra cluelessness on my 
part...


On 12/8/15 9:52 PM, abdullah alamoudi wrote:
> I think that we probably should restrict feed applied functions somehow
> (needs further thoughts and discussions) and I know for sure that we don't.
> As for the case you present, I would imagine that it could be allowed
> theoretically but I think everyone sees why it should be disallowed.
>
> One thing to keep in mind is that we introduce a materialize if the dataset
> was part of an insert pipeline. Now think about how this would work with a
> continuous feed. One choice would be that the feed will materialize all
> records to be inserted and once the feed stops, it would start inserting
> them but I still think we should not allow it.
>
> My 2c,
> Any opposing argument?
>
>
> Amoudi, Abdullah.
>
> On Tue, Dec 8, 2015 at 6:28 PM, Ildar Absalyamov <ildar.absalyamov@gmail.com
>> wrote:
>> Hi All,
>>
>> As a part of feed ingestion we do allow preprocessing incoming data with
>> AQL UDFs.
>> I was wondering if we somehow restrict the kind of UDFs that could be
>> used? Do we allow joins in these UDFs? Especially joins with the same
>> dataset, which is used for intake. Ex:
>>
>> create type TweetType as open {
>>    id: string,
>>    username : string,
>>    location : string,
>>    text : string,
>>    timestamp : string
>> }
>> create dataset Tweets(TweetType)
>> primary key id;
>> create function feed_processor($x) {
>> for $y in dataset Tweets
>> // self-join with Tweets dataset on some predicate($x, $y)
>> return $y
>> }
>> create feed TweetFeed
>> apply function feed_processor;
>>
>> The query above fails in runtime, but I was wondering if that
>> theoretically could work at all.
>>
>> Best regards,
>> Ildar
>>
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message