asterixdb-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Li <che...@gmail.com>
Subject Re: Socket feed questions
Date Thu, 29 Oct 2015 22:36:15 GMT
I think Raman knows where to look for the test case(s) for AQL UDFs?  (The
answer to question 2 is presumably Yes.)

Chen

On Thu, Oct 29, 2015 at 12:22 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
wrote:

> Hi Devs,
>
> I have two related questions,
> 1. Is there any example code of using UDF in feed-adapter?
> 2. Can we use AQL function in those kind of feed UDFs?
>
> Thank you.
>
> On Tue, Oct 27, 2015 at 9:54 PM, Michael Carey <mjcarey@ics.uci.edu>
> wrote:
>
>> Thanks!
>>
>> On 10/27/15 9:48 AM, Raman Grover wrote:
>>
>>> Hi,
>>>
>>>
>>> In the case when data is being received from an external source (e.g.
>>> during feed ingestion), a slow rate of arrival of data may result in
>>> excessive delays until the data is deposited into the target dataset and
>>> made accessible to queries. Data moves along a data ingestion pipeline
>>> between operators as packed fixed size frames. The default behavior is to
>>> wait for the frame to be full before dispatching the contained data to the
>>> downstream operator. However, as noted, this may not suit all scenarios
>>> particularly when data source is sending data at a low rate. To cater to
>>> different scenarios, AsterixDB allows configuring the behavior. The
>>> different options are described next.
>>>
>>> *Push data downstream when*
>>> (a) Frame is full (default)
>>> (b) At least N records (data items) have been collected into a partially
>>> filled frame
>>> (c) At least T seconds have elapsed since the last record was put into
>>> the frame
>>>
>>> *How to configure the behavior?*
>>> At the time of defining a feed, an end-user may specify configuration
>>> parameters that determine the runtime behavior (options (a), (b) or (c)
>>> from above).
>>>
>>> The parameters are described below:
>>>
>>> /"parser-policy"/: A specific strategy chosen from a set of pre-defined
>>> values -
>>>   (i) / "frame_full"/
>>>  This is the default value. As the name suggests, this choice causes
>>> frames to be pushed by the feed adaptor only when there isn't sufficient
>>> space for an additional record to fit in. This corresponds to option (a).
>>>
>>>  (ii) / "counter_timer_expired" /
>>>  Use this as the value if you wish to set either option (b) or (c)  or a
>>> combination of both.
>>>
>>> *Some Examples*
>>> *
>>> *
>>> 1) Pack a maximum of 100 records into a data frame and push it
>>> downstream.
>>>
>>>  create feed my_feed using my_adaptor
>>> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ...
>>> other parameters);
>>>
>>> 2) Wait till 2 seconds and send however many records collected in a
>>> frame downstream.
>>>  create feed my_feed using my_adaptor
>>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")...
>>> other parameters);
>>>
>>> 3) Wait till 100 records have been collected into a data frame or 2
>>> seconds have elapsed since the last record was put into the current data
>>> frame.
>>>  create feed my_feed using my_adaptor
>>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"),
>>> ("batch-size"="100"),... other parameters);
>>>
>>>
>>> *Note*
>>> The above config parameters are not specific to using a particular
>>> implementation of an adaptor but are available for use with any feed
>>> adaptor. Some adaptors that ship with AsterixDB use different default
>>> values for above to suit their specific scenario. E.g. the pull-based
>>> twitter adaptor uses "counter_timer_expired" as the "parser-policy" and
>>> sets the  parameter "batch-interval".
>>>
>>>
>>> Regards,
>>> Raman
>>> PS: The names of the parameters described above are not as intuitive as
>>> one would like them to be. The names need to be changed.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <dtabass@gmail.com <mailto:
>>> dtabass@gmail.com>> wrote:
>>>
>>>     I think we need to have tuning parameters - like batch size and
>>>     maximum tolerable latency (in case there's a lull and you still
>>>     want to push stuff with some worst-case delay). @Raman Grover -
>>>     remind me (us) what's available in this regard?
>>>
>>>     On 10/22/15 4:29 AM, Pääkkönen Pekka wrote:
>>>
>>>>
>>>>     Hi,
>>>>
>>>>     Yes, you are right. I tried sending a larger amount of data, and
>>>>     data is now stored to the database.
>>>>
>>>>     Does it make sense to configure a smaller batch size in order to
>>>>     get more frequent writes?
>>>>
>>>>     Or would it significantly impact performance?
>>>>
>>>>     -Pekka
>>>>
>>>>     Data moves through the pipeline in frame-sized batches, so one
>>>>
>>>>     (uniformed :-)) guess is that you aren't running very long, and
>>>>     you're
>>>>
>>>>     only seeing the data flow when you close because only then do you
>>>>     have a
>>>>
>>>>     batch's worth.  Is that possible?  You can test this by running
>>>>     longer
>>>>
>>>>     (more data) and seeing if you start to see the expected incremental
>>>>
>>>>     flow/inserts. (And we need tunability in this area, e.g.,
>>>>     parameters on
>>>>
>>>>     how much batching and/or low much latency to tolerate on each feed.)
>>>>
>>>>     On 10/21/15 4:45 AM, Pääkkönen Pekka wrote:
>>>>
>>>>     >
>>>>
>>>>     > Hi,
>>>>
>>>>     >
>>>>
>>>>     > Thanks, now I am able to create a socket feed, and save items to
>>>> the
>>>>
>>>>     > dataset from the feed.
>>>>
>>>>     >
>>>>
>>>>     > It seems that data items are written to the dataset after I close
>>>> the
>>>>
>>>>     > socket at the client.
>>>>
>>>>     >
>>>>
>>>>     > Is there some way to indicate to AsterixDB feed (with a newline
or
>>>>
>>>>     > other indicator) that data can be written to the database, when
>>>> the
>>>>
>>>>     > connection is open?
>>>>
>>>>     >
>>>>
>>>>     > After I close the socket at the client, the feed seems to close
>>>> down.
>>>>
>>>>     > Or is it only paused, until it is resumed?
>>>>
>>>>     >
>>>>
>>>>     > -Pekka
>>>>
>>>>     >
>>>>
>>>>     > Hi Pekka,
>>>>
>>>>     >
>>>>
>>>>     > That's interesting, I'm not sure why the CC would appear as being
>>>> down
>>>>
>>>>     >
>>>>
>>>>     > to Managix. However if you can access the web console, it that
>>>>
>>>>     >
>>>>
>>>>     > evidently isn't the case.
>>>>
>>>>     >
>>>>
>>>>     > As for data ingestion via sockets, yes it is possible, but it
>>>> kind of
>>>>
>>>>     >
>>>>
>>>>     > depends on what's meant by sockets. There's no tutorial for it,
>>>> but
>>>>
>>>>     >
>>>>
>>>>     > take a look at SocketBasedFeedAdapter in the source, as well as
>>>>
>>>>     >
>>>>
>>>>     >
>>>> https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java
>>>>
>>>>     >
>>>>
>>>>     > for some examples of how it works.
>>>>
>>>>     >
>>>>
>>>>     > Hope that helps!
>>>>
>>>>     >
>>>>
>>>>     > Thanks,
>>>>
>>>>     >
>>>>
>>>>     > -Ian
>>>>
>>>>     >
>>>>
>>>>     > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka
>>>>
>>>>     ><Pekka.Paakkonen@vtt.fi> <mailto:Pekka.Paakkonen@vtt.fi>
wrote:
>>>>
>>>>     > > Hi Ian,
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Thanks for the reply.
>>>>
>>>>     > >
>>>>
>>>>     > > I compiled AsterixDB v0.87 and started it.
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > However, I get the following warnings:
>>>>
>>>>     > >
>>>>
>>>>     > > INFO: Name:my_asterix
>>>>
>>>>     > >
>>>>
>>>>     > > Created:Mon Oct 19 08:37:16 UTC 2015
>>>>
>>>>     > >
>>>>
>>>>     > > Web-Url:http://192.168.101.144:19001
>>>>
>>>>     > >
>>>>
>>>>     > > State:UNUSABLE
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > WARNING!:Cluster Controller not running at master
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Also, I see the following warnings in my_asterixdb1.log. there
>>>>     are no
>>>>
>>>>     > > warnings or errors in cc.log
>>>>
>>>>     > >
>>>>
>>>>     > > “
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:37:39 AM
>>>>
>>>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager
>>>> configure
>>>>
>>>>     > >
>>>>
>>>>     > > SEVERE: LifecycleComponentManager configured
>>>>
>>>>     > >
>>>> org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47
>>>>
>>>>     > >
>>>>
>>>>     > > ..
>>>>
>>>>     > >
>>>>
>>>>     > > INFO: Completed sharp checkpoint.
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:37:40 AM
>>>>     org.apache.asterix.om.util.AsterixClusterProperties
>>>>
>>>>     > > getIODevices
>>>>
>>>>     > >
>>>>
>>>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>>>     not found. The
>>>>
>>>>     > > node has not joined yet or has left.
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:37:40 AM
>>>>     org.apache.asterix.om.util.AsterixClusterProperties
>>>>
>>>>     > > getIODevices
>>>>
>>>>     > >
>>>>
>>>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>>>     not found. The
>>>>
>>>>     > > node has not joined yet or has left.
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:38:38 AM
>>>>
>>>>     > > org.apache.hyracks.control.common.dataset.ResultStateSweeper
>>>> sweep
>>>>
>>>>     > >
>>>>
>>>>     > > INFO: Result state cleanup instance successfully completed.”
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > I seems that AsterixDB is running, and I can access it at port
>>>> 19001.
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > The documentation shows ingestion of tweets, but I would be
>>>>     interested in
>>>>
>>>>     > > using sockets.
>>>>
>>>>     > >
>>>>
>>>>     > > Is it possible to ingest data from sockets?
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Regards,
>>>>
>>>>     > >
>>>>
>>>>     > > -Pekka
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Hey there Pekka,
>>>>
>>>>     > >
>>>>
>>>>     > > Your intuition is correct, most of the newer feeds features
are
>>>> in the
>>>>
>>>>     > >
>>>>
>>>>     > > current master branch and not in the (very) old 0.8.6 release.
>>>>     If you'd
>>>>
>>>>     > >
>>>>
>>>>     > > like to experiment with them you'll have to build from source.
>>>> The
>>>>     details
>>>>
>>>>     > >
>>>>
>>>>     > > about that are here:
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>> https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse
>>>>
>>>>     > >
>>>>
>>>>     > > , but they're probably a bit overkill for just trying to get
the
>>>>     compiled
>>>>
>>>>     > >
>>>>
>>>>     > > binaries. For that all you really need to do is :
>>>>
>>>>     > >
>>>>
>>>>     > > - Clone Hyracks from git
>>>>
>>>>     > >
>>>>
>>>>     > > - 'mvn clean install -DskipTests'
>>>>
>>>>     > >
>>>>
>>>>     > > - Clone AsterixDB
>>>>
>>>>     > >
>>>>
>>>>     > > - 'mvn clean package -DskipTests'
>>>>
>>>>     > >
>>>>
>>>>     > > Then, the binaries will sit in asterix-installer/target
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > For an example, the documentation shows how to set up a feed
>>>> that's
>>>>
>>>>     > >
>>>>
>>>>     > > ingesting Tweets:
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Thanks,
>>>>
>>>>     > >
>>>>
>>>>     > > -Ian
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka
>>>>     <Pekka.Paakkonen@vtt.fi> <mailto:Pekka.Paakkonen@vtt.fi>
>>>>
>>>>     > >
>>>>
>>>>     > > wrote:
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >> Hi,
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> I would like to experiment with a socket-based feed.
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> Can you point me to an example on how to utilize them?
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> Do I need to install 0.8.7-snapshot version of AsterixDB
in
>>>> order to
>>>>
>>>>     > >
>>>>
>>>>     > >> experiment with feeds?
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> Regards,
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> -Pekka Pääkkönen
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     >
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Raman
>>>
>>
>>
>
>
> --
>
> -----------------
> Best Regards
>
> Jianfeng Jia
> Ph.D. Candidate of Computer Science
> University of California, Irvine
>

Mime
View raw message