asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abdullah alamoudi <bamou...@gmail.com>
Subject Re: Feeds UDF
Date Wed, 09 Dec 2015 17:48:35 GMT
But if the function actually takes a single record and performs a join
effectively producing a collection of records that feeds into the same
dataset, wouldn't that create a chance for this infinite loop that would
eventually fills up the storage and explodes the dataset?

One thing to note is that in their current implementation, feed connections
are translated into insert statements that go through the query compiler,
meaning that a materialize operator will be introduced.

Cheers,
Abdullah.

Amoudi, Abdullah.

On Wed, Dec 9, 2015 at 9:40 AM, Mike Carey <dtabass@gmail.com> wrote:

> Hmmm....  I'm not sure where the Halloween problem is in this case - for a
> given record being ingested, it's not in the dataset yet, and won't get to
> move furrher thru the pipeline to the point where it IS in the data set
> until after the query evaluation is over, the result has been computed, and
> the new object (the one to be inserted) has been determined.  At least
> that's how it should work.  There should thus be no way for the ingestion
> pipeline query to see a record twice in a self-join scenario, because it
> won't be in play in the dataset yet (it's not part of "self") - right?  (Or
> is there a subtlety that I'm missing?)
>
> Cheers,
> Mike
>
>
> On 12/9/15 6:59 AM, abdullah alamoudi wrote:
>
>> The only problem I see is the Halloween problem in case of a self join,
>> hence the need for materialization(not sure if it is possible in this case
>> but definitely possible in general). Other than that, I don't think there
>> is any problem.
>>
>> Cheers,
>> Abdullah
>> On Dec 8, 2015 11:51 PM, "Mike Carey" <dtabass@gmail.com> wrote:
>>
>> (I am still completely not seeing a problem here.)
>>>
>>> On 12/8/15 10:20 PM, abdullah alamoudi wrote:
>>>
>>> The plan is to mostly use Upsert in the future since we can do some
>>>> optimizations with it that we can't do with an insert.
>>>> We should also support deletes as well and probably allow a mix of the
>>>> three operations within the same feed. This is a work in progress right
>>>> now
>>>> but before I go far, I am stabilizing some other parts of the feeds.
>>>>
>>>> Cheers,
>>>> Abdullah.
>>>>
>>>>
>>>> Amoudi, Abdullah.
>>>>
>>>> On Tue, Dec 8, 2015 at 10:11 PM, Ildar Absalyamov <
>>>> ildar.absalyamov@gmail.com> wrote:
>>>>
>>>> Abdullah,
>>>>
>>>>> OK, now I see what problems it will cause.
>>>>> Kinda related question: could the feed implement “upsert” semantics,
>>>>> that
>>>>> you’ve been working on, instead of “insert” semantics?
>>>>>
>>>>> On Dec 8, 2015, at 21:52, abdullah alamoudi <bamousaa@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I think that we probably should restrict feed applied functions
>>>>>> somehow
>>>>>> (needs further thoughts and discussions) and I know for sure that
we
>>>>>>
>>>>>> don't.
>>>>>
>>>>> As for the case you present, I would imagine that it could be allowed
>>>>>> theoretically but I think everyone sees why it should be disallowed.
>>>>>>
>>>>>> One thing to keep in mind is that we introduce a materialize if the
>>>>>>
>>>>>> dataset
>>>>>
>>>>> was part of an insert pipeline. Now think about how this would work
>>>>>> with
>>>>>>
>>>>>> a
>>>>>
>>>>> continuous feed. One choice would be that the feed will materialize all
>>>>>> records to be inserted and once the feed stops, it would start
>>>>>> inserting
>>>>>> them but I still think we should not allow it.
>>>>>>
>>>>>> My 2c,
>>>>>> Any opposing argument?
>>>>>>
>>>>>>
>>>>>> Amoudi, Abdullah.
>>>>>>
>>>>>> On Tue, Dec 8, 2015 at 6:28 PM, Ildar Absalyamov <
>>>>>>
>>>>>> ildar.absalyamov@gmail.com
>>>>>
>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> As a part of feed ingestion we do allow preprocessing incoming
data
>>>>>>> with
>>>>>>> AQL UDFs.
>>>>>>> I was wondering if we somehow restrict the kind of UDFs that
could be
>>>>>>> used? Do we allow joins in these UDFs? Especially joins with
the same
>>>>>>> dataset, which is used for intake. Ex:
>>>>>>>
>>>>>>> create type TweetType as open {
>>>>>>>    id: string,
>>>>>>>    username : string,
>>>>>>>    location : string,
>>>>>>>    text : string,
>>>>>>>    timestamp : string
>>>>>>> }
>>>>>>> create dataset Tweets(TweetType)
>>>>>>> primary key id;
>>>>>>> create function feed_processor($x) {
>>>>>>> for $y in dataset Tweets
>>>>>>> // self-join with Tweets dataset on some predicate($x, $y)
>>>>>>> return $y
>>>>>>> }
>>>>>>> create feed TweetFeed
>>>>>>> apply function feed_processor;
>>>>>>>
>>>>>>> The query above fails in runtime, but I was wondering if that
>>>>>>> theoretically could work at all.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Ildar
>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>> Ildar
>>>>>
>>>>>
>>>>>
>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message