asterixdb-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Socket feed questions
Date Thu, 29 Oct 2015 22:46:48 GMT
Yes and yes, I believe!  The AQL UDF case is less tested, I believe, but it
should work...
On Oct 29, 2015 12:22 PM, "Jianfeng Jia" <jianfeng.jia@gmail.com> wrote:

> Hi Devs,
>
> I have two related questions,
> 1. Is there any example code of using UDF in feed-adapter?
> 2. Can we use AQL function in those kind of feed UDFs?
>
> Thank you.
>
> On Tue, Oct 27, 2015 at 9:54 PM, Michael Carey <mjcarey@ics.uci.edu>
> wrote:
>
> > Thanks!
> >
> > On 10/27/15 9:48 AM, Raman Grover wrote:
> >
> >> Hi,
> >>
> >>
> >> In the case when data is being received from an external source (e.g.
> >> during feed ingestion), a slow rate of arrival of data may result in
> >> excessive delays until the data is deposited into the target dataset and
> >> made accessible to queries. Data moves along a data ingestion pipeline
> >> between operators as packed fixed size frames. The default behavior is
> to
> >> wait for the frame to be full before dispatching the contained data to
> the
> >> downstream operator. However, as noted, this may not suit all scenarios
> >> particularly when data source is sending data at a low rate. To cater to
> >> different scenarios, AsterixDB allows configuring the behavior. The
> >> different options are described next.
> >>
> >> *Push data downstream when*
> >> (a) Frame is full (default)
> >> (b) At least N records (data items) have been collected into a partially
> >> filled frame
> >> (c) At least T seconds have elapsed since the last record was put into
> >> the frame
> >>
> >> *How to configure the behavior?*
> >> At the time of defining a feed, an end-user may specify configuration
> >> parameters that determine the runtime behavior (options (a), (b) or (c)
> >> from above).
> >>
> >> The parameters are described below:
> >>
> >> /"parser-policy"/: A specific strategy chosen from a set of pre-defined
> >> values -
> >>   (i) / "frame_full"/
> >>  This is the default value. As the name suggests, this choice causes
> >> frames to be pushed by the feed adaptor only when there isn't sufficient
> >> space for an additional record to fit in. This corresponds to option
> (a).
> >>
> >>  (ii) / "counter_timer_expired" /
> >>  Use this as the value if you wish to set either option (b) or (c)  or a
> >> combination of both.
> >>
> >> *Some Examples*
> >> *
> >> *
> >> 1) Pack a maximum of 100 records into a data frame and push it
> downstream.
> >>
> >>  create feed my_feed using my_adaptor
> >> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ...
> >> other parameters);
> >>
> >> 2) Wait till 2 seconds and send however many records collected in a
> frame
> >> downstream.
> >>  create feed my_feed using my_adaptor
> >> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")...
> >> other parameters);
> >>
> >> 3) Wait till 100 records have been collected into a data frame or 2
> >> seconds have elapsed since the last record was put into the current data
> >> frame.
> >>  create feed my_feed using my_adaptor
> >> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"),
> >> ("batch-size"="100"),... other parameters);
> >>
> >>
> >> *Note*
> >> The above config parameters are not specific to using a particular
> >> implementation of an adaptor but are available for use with any feed
> >> adaptor. Some adaptors that ship with AsterixDB use different default
> >> values for above to suit their specific scenario. E.g. the pull-based
> >> twitter adaptor uses "counter_timer_expired" as the "parser-policy" and
> >> sets the  parameter "batch-interval".
> >>
> >>
> >> Regards,
> >> Raman
> >> PS: The names of the parameters described above are not as intuitive as
> >> one would like them to be. The names need to be changed.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <dtabass@gmail.com <mailto:
> >> dtabass@gmail.com>> wrote:
> >>
> >>     I think we need to have tuning parameters - like batch size and
> >>     maximum tolerable latency (in case there's a lull and you still
> >>     want to push stuff with some worst-case delay). @Raman Grover -
> >>     remind me (us) what's available in this regard?
> >>
> >>     On 10/22/15 4:29 AM, Pääkkönen Pekka wrote:
> >>
> >>>
> >>>     Hi,
> >>>
> >>>     Yes, you are right. I tried sending a larger amount of data, and
> >>>     data is now stored to the database.
> >>>
> >>>     Does it make sense to configure a smaller batch size in order to
> >>>     get more frequent writes?
> >>>
> >>>     Or would it significantly impact performance?
> >>>
> >>>     -Pekka
> >>>
> >>>     Data moves through the pipeline in frame-sized batches, so one
> >>>
> >>>     (uniformed :-)) guess is that you aren't running very long, and
> >>>     you're
> >>>
> >>>     only seeing the data flow when you close because only then do you
> >>>     have a
> >>>
> >>>     batch's worth.  Is that possible?  You can test this by running
> >>>     longer
> >>>
> >>>     (more data) and seeing if you start to see the expected incremental
> >>>
> >>>     flow/inserts. (And we need tunability in this area, e.g.,
> >>>     parameters on
> >>>
> >>>     how much batching and/or low much latency to tolerate on each
> feed.)
> >>>
> >>>     On 10/21/15 4:45 AM, Pääkkönen Pekka wrote:
> >>>
> >>>     >
> >>>
> >>>     > Hi,
> >>>
> >>>     >
> >>>
> >>>     > Thanks, now I am able to create a socket feed, and save items to
> >>> the
> >>>
> >>>     > dataset from the feed.
> >>>
> >>>     >
> >>>
> >>>     > It seems that data items are written to the dataset after I close
> >>> the
> >>>
> >>>     > socket at the client.
> >>>
> >>>     >
> >>>
> >>>     > Is there some way to indicate to AsterixDB feed (with a newline
> or
> >>>
> >>>     > other indicator) that data can be written to the database, when
> the
> >>>
> >>>     > connection is open?
> >>>
> >>>     >
> >>>
> >>>     > After I close the socket at the client, the feed seems to close
> >>> down.
> >>>
> >>>     > Or is it only paused, until it is resumed?
> >>>
> >>>     >
> >>>
> >>>     > -Pekka
> >>>
> >>>     >
> >>>
> >>>     > Hi Pekka,
> >>>
> >>>     >
> >>>
> >>>     > That's interesting, I'm not sure why the CC would appear as being
> >>> down
> >>>
> >>>     >
> >>>
> >>>     > to Managix. However if you can access the web console, it that
> >>>
> >>>     >
> >>>
> >>>     > evidently isn't the case.
> >>>
> >>>     >
> >>>
> >>>     > As for data ingestion via sockets, yes it is possible, but it
> kind
> >>> of
> >>>
> >>>     >
> >>>
> >>>     > depends on what's meant by sockets. There's no tutorial for it,
> but
> >>>
> >>>     >
> >>>
> >>>     > take a look at SocketBasedFeedAdapter in the source, as well as
> >>>
> >>>     >
> >>>
> >>>     >
> >>>
> https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java
> >>>
> >>>     >
> >>>
> >>>     > for some examples of how it works.
> >>>
> >>>     >
> >>>
> >>>     > Hope that helps!
> >>>
> >>>     >
> >>>
> >>>     > Thanks,
> >>>
> >>>     >
> >>>
> >>>     > -Ian
> >>>
> >>>     >
> >>>
> >>>     > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka
> >>>
> >>>     ><Pekka.Paakkonen@vtt.fi> <mailto:Pekka.Paakkonen@vtt.fi>
wrote:
> >>>
> >>>     > > Hi Ian,
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Thanks for the reply.
> >>>
> >>>     > >
> >>>
> >>>     > > I compiled AsterixDB v0.87 and started it.
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > However, I get the following warnings:
> >>>
> >>>     > >
> >>>
> >>>     > > INFO: Name:my_asterix
> >>>
> >>>     > >
> >>>
> >>>     > > Created:Mon Oct 19 08:37:16 UTC 2015
> >>>
> >>>     > >
> >>>
> >>>     > > Web-Url:http://192.168.101.144:19001
> >>>
> >>>     > >
> >>>
> >>>     > > State:UNUSABLE
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > WARNING!:Cluster Controller not running at master
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Also, I see the following warnings in my_asterixdb1.log. there
> >>>     are no
> >>>
> >>>     > > warnings or errors in cc.log
> >>>
> >>>     > >
> >>>
> >>>     > > “
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:37:39 AM
> >>>
> >>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager
> >>> configure
> >>>
> >>>     > >
> >>>
> >>>     > > SEVERE: LifecycleComponentManager configured
> >>>
> >>>     > >
> >>> org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47
> >>>
> >>>     > >
> >>>
> >>>     > > ..
> >>>
> >>>     > >
> >>>
> >>>     > > INFO: Completed sharp checkpoint.
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:37:40 AM
> >>>     org.apache.asterix.om.util.AsterixClusterProperties
> >>>
> >>>     > > getIODevices
> >>>
> >>>     > >
> >>>
> >>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
> >>>     not found. The
> >>>
> >>>     > > node has not joined yet or has left.
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:37:40 AM
> >>>     org.apache.asterix.om.util.AsterixClusterProperties
> >>>
> >>>     > > getIODevices
> >>>
> >>>     > >
> >>>
> >>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
> >>>     not found. The
> >>>
> >>>     > > node has not joined yet or has left.
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:38:38 AM
> >>>
> >>>     > > org.apache.hyracks.control.common.dataset.ResultStateSweeper
> >>> sweep
> >>>
> >>>     > >
> >>>
> >>>     > > INFO: Result state cleanup instance successfully completed.”
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > I seems that AsterixDB is running, and I can access it at
port
> >>> 19001.
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > The documentation shows ingestion of tweets, but I would be
> >>>     interested in
> >>>
> >>>     > > using sockets.
> >>>
> >>>     > >
> >>>
> >>>     > > Is it possible to ingest data from sockets?
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Regards,
> >>>
> >>>     > >
> >>>
> >>>     > > -Pekka
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Hey there Pekka,
> >>>
> >>>     > >
> >>>
> >>>     > > Your intuition is correct, most of the newer feeds features
are
> >>> in the
> >>>
> >>>     > >
> >>>
> >>>     > > current master branch and not in the (very) old 0.8.6 release.
> >>>     If you'd
> >>>
> >>>     > >
> >>>
> >>>     > > like to experiment with them you'll have to build from source.
> >>> The
> >>>     details
> >>>
> >>>     > >
> >>>
> >>>     > > about that are here:
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse
> >>>
> >>>     > >
> >>>
> >>>     > > , but they're probably a bit overkill for just trying to get
> the
> >>>     compiled
> >>>
> >>>     > >
> >>>
> >>>     > > binaries. For that all you really need to do is :
> >>>
> >>>     > >
> >>>
> >>>     > > - Clone Hyracks from git
> >>>
> >>>     > >
> >>>
> >>>     > > - 'mvn clean install -DskipTests'
> >>>
> >>>     > >
> >>>
> >>>     > > - Clone AsterixDB
> >>>
> >>>     > >
> >>>
> >>>     > > - 'mvn clean package -DskipTests'
> >>>
> >>>     > >
> >>>
> >>>     > > Then, the binaries will sit in asterix-installer/target
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > For an example, the documentation shows how to set up a feed
> >>> that's
> >>>
> >>>     > >
> >>>
> >>>     > > ingesting Tweets:
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Thanks,
> >>>
> >>>     > >
> >>>
> >>>     > > -Ian
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka
> >>>     <Pekka.Paakkonen@vtt.fi> <mailto:Pekka.Paakkonen@vtt.fi>
> >>>
> >>>     > >
> >>>
> >>>     > > wrote:
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >> Hi,
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> I would like to experiment with a socket-based feed.
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> Can you point me to an example on how to utilize them?
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> Do I need to install 0.8.7-snapshot version of AsterixDB
in
> >>> order to
> >>>
> >>>     > >
> >>>
> >>>     > >> experiment with feeds?
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> Regards,
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> -Pekka Pääkkönen
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     >
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Raman
> >>
> >
> >
>
>
> --
>
> -----------------
> Best Regards
>
> Jianfeng Jia
> Ph.D. Candidate of Computer Science
> University of California, Irvine
>

Mime
View raw message