asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Carey <mjca...@ics.uci.edu>
Subject Re: Socket feed questions
Date Wed, 28 Oct 2015 04:54:03 GMT
Thanks!

On 10/27/15 9:48 AM, Raman Grover wrote:
> Hi,
>
>
> In the case when data is being received from an external source (e.g. 
> during feed ingestion), a slow rate of arrival of data may result in 
> excessive delays until the data is deposited into the target dataset 
> and made accessible to queries. Data moves along a data ingestion 
> pipeline between operators as packed fixed size frames. The default 
> behavior is to wait for the frame to be full before dispatching the 
> contained data to the downstream operator. However, as noted, this may 
> not suit all scenarios particularly when data source is sending data 
> at a low rate. To cater to different scenarios, AsterixDB allows 
> configuring the behavior. The different options are described next.
>
> *Push data downstream when*
> (a) Frame is full (default)
> (b) At least N records (data items) have been collected into a 
> partially filled frame
> (c) At least T seconds have elapsed since the last record was put into 
> the frame
>
> *How to configure the behavior?*
> At the time of defining a feed, an end-user may specify configuration 
> parameters that determine the runtime behavior (options (a), (b) or 
> (c) from above).
>
> The parameters are described below:
>
> /"parser-policy"/: A specific strategy chosen from a set of 
> pre-defined values -
>   (i) / "frame_full"/
>  This is the default value. As the name suggests, this choice causes 
> frames to be pushed by the feed adaptor only when there isn't 
> sufficient space for an additional record to fit in. This corresponds 
> to option (a).
>
>  (ii) / "counter_timer_expired" /
>  Use this as the value if you wish to set either option (b) or (c)  or 
> a combination of both.
>
> *Some Examples*
> *
> *
> 1) Pack a maximum of 100 records into a data frame and push it 
> downstream.
>
>  create feed my_feed using my_adaptor
> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ... 
> other parameters);
>
> 2) Wait till 2 seconds and send however many records collected in a 
> frame downstream.
>  create feed my_feed using my_adaptor
> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")... 
> other parameters);
>
> 3) Wait till 100 records have been collected into a data frame or 2 
> seconds have elapsed since the last record was put into the current 
> data frame.
>  create feed my_feed using my_adaptor
> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"), 
> ("batch-size"="100"),... other parameters);
>
>
> *Note*
> The above config parameters are not specific to using a particular 
> implementation of an adaptor but are available for use with any feed 
> adaptor. Some adaptors that ship with AsterixDB use different default 
> values for above to suit their specific scenario. E.g. the pull-based 
> twitter adaptor uses "counter_timer_expired" as the "parser-policy" 
> and sets the  parameter "batch-interval".
>
>
> Regards,
> Raman
> PS: The names of the parameters described above are not as intuitive 
> as one would like them to be. The names need to be changed.
>
>
>
>
>
>
>
>
> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <dtabass@gmail.com 
> <mailto:dtabass@gmail.com>> wrote:
>
>     I think we need to have tuning parameters - like batch size and
>     maximum tolerable latency (in case there's a lull and you still
>     want to push stuff with some worst-case delay). @Raman Grover -
>     remind me (us) what's available in this regard?
>
>     On 10/22/15 4:29 AM, Pääkkönen Pekka wrote:
>>
>>     Hi,
>>
>>     Yes, you are right. I tried sending a larger amount of data, and
>>     data is now stored to the database.
>>
>>     Does it make sense to configure a smaller batch size in order to
>>     get more frequent writes?
>>
>>     Or would it significantly impact performance?
>>
>>     -Pekka
>>
>>     Data moves through the pipeline in frame-sized batches, so one
>>
>>     (uniformed :-)) guess is that you aren't running very long, and
>>     you're
>>
>>     only seeing the data flow when you close because only then do you
>>     have a
>>
>>     batch's worth.  Is that possible?  You can test this by running
>>     longer
>>
>>     (more data) and seeing if you start to see the expected incremental
>>
>>     flow/inserts. (And we need tunability in this area, e.g.,
>>     parameters on
>>
>>     how much batching and/or low much latency to tolerate on each feed.)
>>
>>     On 10/21/15 4:45 AM, Pääkkönen Pekka wrote:
>>
>>     >
>>
>>     > Hi,
>>
>>     >
>>
>>     > Thanks, now I am able to create a socket feed, and save items to the
>>
>>     > dataset from the feed.
>>
>>     >
>>
>>     > It seems that data items are written to the dataset after I close the
>>
>>     > socket at the client.
>>
>>     >
>>
>>     > Is there some way to indicate to AsterixDB feed (with a newline or
>>
>>     > other indicator) that data can be written to the database, when the
>>
>>     > connection is open?
>>
>>     >
>>
>>     > After I close the socket at the client, the feed seems to close down.
>>
>>     > Or is it only paused, until it is resumed?
>>
>>     >
>>
>>     > -Pekka
>>
>>     >
>>
>>     > Hi Pekka,
>>
>>     >
>>
>>     > That's interesting, I'm not sure why the CC would appear as being down
>>
>>     >
>>
>>     > to Managix. However if you can access the web console, it that
>>
>>     >
>>
>>     > evidently isn't the case.
>>
>>     >
>>
>>     > As for data ingestion via sockets, yes it is possible, but it kind of
>>
>>     >
>>
>>     > depends on what's meant by sockets. There's no tutorial for it, but
>>
>>     >
>>
>>     > take a look at SocketBasedFeedAdapter in the source, as well as
>>
>>     >
>>
>>     > https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java
>>
>>     >
>>
>>     > for some examples of how it works.
>>
>>     >
>>
>>     > Hope that helps!
>>
>>     >
>>
>>     > Thanks,
>>
>>     >
>>
>>     > -Ian
>>
>>     >
>>
>>     > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka
>>
>>     ><Pekka.Paakkonen@vtt.fi> <mailto:Pekka.Paakkonen@vtt.fi> wrote:
>>
>>     > > Hi Ian,
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > Thanks for the reply.
>>
>>     > >
>>
>>     > > I compiled AsterixDB v0.87 and started it.
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > However, I get the following warnings:
>>
>>     > >
>>
>>     > > INFO: Name:my_asterix
>>
>>     > >
>>
>>     > > Created:Mon Oct 19 08:37:16 UTC 2015
>>
>>     > >
>>
>>     > > Web-Url:http://192.168.101.144:19001
>>
>>     > >
>>
>>     > > State:UNUSABLE
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > WARNING!:Cluster Controller not running at master
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > Also, I see the following warnings in my_asterixdb1.log. there
>>     are no
>>
>>     > > warnings or errors in cc.log
>>
>>     > >
>>
>>     > > “
>>
>>     > >
>>
>>     > > Oct 19, 2015 8:37:39 AM
>>
>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager configure
>>
>>     > >
>>
>>     > > SEVERE: LifecycleComponentManager configured
>>
>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47
>>
>>     > >
>>
>>     > > ..
>>
>>     > >
>>
>>     > > INFO: Completed sharp checkpoint.
>>
>>     > >
>>
>>     > > Oct 19, 2015 8:37:40 AM
>>     org.apache.asterix.om.util.AsterixClusterProperties
>>
>>     > > getIODevices
>>
>>     > >
>>
>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>     not found. The
>>
>>     > > node has not joined yet or has left.
>>
>>     > >
>>
>>     > > Oct 19, 2015 8:37:40 AM
>>     org.apache.asterix.om.util.AsterixClusterProperties
>>
>>     > > getIODevices
>>
>>     > >
>>
>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>     not found. The
>>
>>     > > node has not joined yet or has left.
>>
>>     > >
>>
>>     > > Oct 19, 2015 8:38:38 AM
>>
>>     > > org.apache.hyracks.control.common.dataset.ResultStateSweeper sweep
>>
>>     > >
>>
>>     > > INFO: Result state cleanup instance successfully completed.”
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > I seems that AsterixDB is running, and I can access it at port 19001.
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > The documentation shows ingestion of tweets, but I would be
>>     interested in
>>
>>     > > using sockets.
>>
>>     > >
>>
>>     > > Is it possible to ingest data from sockets?
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > Regards,
>>
>>     > >
>>
>>     > > -Pekka
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > Hey there Pekka,
>>
>>     > >
>>
>>     > > Your intuition is correct, most of the newer feeds features are in
the
>>
>>     > >
>>
>>     > > current master branch and not in the (very) old 0.8.6 release.
>>     If you'd
>>
>>     > >
>>
>>     > > like to experiment with them you'll have to build from source. The
>>     details
>>
>>     > >
>>
>>     > > about that are here:
>>
>>     > >
>>
>>     > > https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse
>>
>>     > >
>>
>>     > > , but they're probably a bit overkill for just trying to get the
>>     compiled
>>
>>     > >
>>
>>     > > binaries. For that all you really need to do is :
>>
>>     > >
>>
>>     > > - Clone Hyracks from git
>>
>>     > >
>>
>>     > > - 'mvn clean install -DskipTests'
>>
>>     > >
>>
>>     > > - Clone AsterixDB
>>
>>     > >
>>
>>     > > - 'mvn clean package -DskipTests'
>>
>>     > >
>>
>>     > > Then, the binaries will sit in asterix-installer/target
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > For an example, the documentation shows how to set up a feed that's
>>
>>     > >
>>
>>     > > ingesting Tweets:
>>
>>     > >
>>
>>     > > https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > Thanks,
>>
>>     > >
>>
>>     > > -Ian
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka
>>     <Pekka.Paakkonen@vtt.fi> <mailto:Pekka.Paakkonen@vtt.fi>
>>
>>     > >
>>
>>     > > wrote:
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >> Hi,
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >> I would like to experiment with a socket-based feed.
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >> Can you point me to an example on how to utilize them?
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >> Do I need to install 0.8.7-snapshot version of AsterixDB in order
to
>>
>>     > >
>>
>>     > >> experiment with feeds?
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >> Regards,
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >> -Pekka Pääkkönen
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >
>>
>>     >
>>
>
>
>
>
> -- 
> Raman


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message