fluo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Meier, Caleb" <Caleb.Me...@parsons.com>
Subject RE: third party service to poll Fluo for absence of event
Date Fri, 17 Feb 2017 13:40:47 GMT
Thanks a lot Keith.  That was really helpful.  I'll trying tinkering with the fluo.impl.ScanTask.maxSleep
property to see if I can liven things up a bit.

Caleb A. Meier, Ph.D.
Software Engineer II ♦ Analyst
Parsons Corporation
1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
Office:  (703)797-3066
Caleb.Meier@Parsons.com ♦ www.parsons.com

-----Original Message-----
From: Keith Turner [mailto:keith@deenlo.com] 
Sent: Thursday, February 16, 2017 6:02 PM
To: dev@fluo.incubator.apache.org
Subject: Re: third party service to poll Fluo for absence of event

On Thu, Feb 16, 2017 at 12:08 PM, Meier, Caleb <Caleb.Meier@parsons.com> wrote:
> Quick question.  How timely is Fluo when it comes to processing notifications?  If there
are enough workers, will a notification be processed in a timely manner after writing to the
observed column?  Earlier we had a discussion about a periodic query service.
> If I write a notification to issue a periodic query to Fluo, can I expect that my Observer
will process that query fairly quickly (provided there are enough workers)?

Fluo keeps track of the last time it saw a notification in a tablet and exponentially increases
the scan period for that tablet when it keeps seeing nothing.  The increase is up to a configurable
maximum.

The following code does the backoff.

  https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dfluo_blob_rel_fluo-2D1.0.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_worker_finder_hash_TabletData.java&d=CwIFaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=pSG3R4ixSmXTc6ylLxMgs2ZWPSv1UbQN6qOvmuscQIk&s=Kt-fnyzs5et-oFglhm59HR9yOWLcQYWraarJY3h1vnM&e=


The following is where it gets the max sleep time.

  https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dfluo_blob_rel_fluo-2D1.0.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_worker_finder_hash_ScanTask.java-23L72&d=CwIFaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=pSG3R4ixSmXTc6ylLxMgs2ZWPSv1UbQN6qOvmuscQIk&s=-Pp1MzlArxkS1JFEeJDP79fAOtthoQmDLl8HW-P8XsM&e=


Looking at the following, the default max sleep time for a tablet is 5
minutes.   If I expanded the constant correctly, this can be changed
by setting fluo.impl.ScanTask.maxSleep.  Note, impl properties are not part of the public
API and are subject to change when the implementation changes.

  https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dfluo_blob_rel_fluo-2D1.0.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_impl_FluoConfigurationImpl.java-23L37&d=CwIFaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=pSG3R4ixSmXTc6ylLxMgs2ZWPSv1UbQN6qOvmuscQIk&s=GxvVrABsKcsKx_E_NPwxWqibuzTd8uSJFiOV5Saf2jE&e=


Also, when Fluo has notifications queued, it will wait till the queue size halves before scanning
again for notifications.  So if 10,000 notifications were queued, then scanning for notifications
would not happen until the queue size was 5,000 or less.  The following code shows where that
happens.  I noticed while looking for the max scan sleep code.

  https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dfluo_blob_rel_fluo-2D1.0.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_worker_finder_hash_ScanTask.java-23L85&d=CwIFaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=pSG3R4ixSmXTc6ylLxMgs2ZWPSv1UbQN6qOvmuscQIk&s=jnFRdNJK4RxHd3qjLcWZJIQTXY1_sHjVhNEebg5GWA4&e=


>
> Caleb A. Meier, Ph.D.
> Software Engineer II ♦ Analyst
> Parsons Corporation
> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> Office:  (703)797-3066
> Caleb.Meier@Parsons.com ♦ www.parsons.com
>
> -----Original Message-----
> From: Keith Turner [mailto:keith@deenlo.com]
> Sent: Wednesday, February 01, 2017 11:03 PM
> To: dev@fluo.incubator.apache.org
> Subject: Re: third party service to poll Fluo for absence of event
>
> On Wed, Feb 1, 2017 at 9:54 PM, Christopher <ctubbsii@apache.org> wrote:
>> On Wed, Feb 1, 2017 at 10:04 AM Meier, Caleb 
>> <Caleb.Meier@parsons.com>
>> wrote:
>>
>>> Yeah, this seems pretty reasonable to me.  I guess it then boils 
>>> down to the nitty gritty of do I store results in Fluo and have my 
>>> service query Fluo (I think you guys actually advise against that in 
>>> your documentation), or export results and then have the service 
>>> query some external index that I am exporting to.
>>>
>>>
>> I'm not sure we advise against it, so much as recognize that it may 
>> not be suitable for certain use cases and may not meet query 
>> performance expectations ( 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__fluo.apache.org_docs_fluo-2Drecipes_1.0.0-2Dincubating_export-2Dqueue_&d=CwIFaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=zqJSJTFo90FyUVCiF79uq3P0FHnxr0MLFKbsPsHGgyk&s=spmwJN_FBTO6TBBT2dne8sbE7MRMrlhz8lLPpfPZBbs&e=
).
>>
>
> I would advise against querying Fluo for low latency queries.
> However, this external service thats checking a few stats within Fluo and injecting new
notifications probably does not care about latency.
>
> The reason Fluo is not geared towards low latency is that it does lazy
> recovery of failed transactions.   Failed transactions are not cleaned
> up until something tries to read the data, which could significantly delay reads.
>
>> In any case, your observer need not write the final "last occurrence"
>> entries into a Fluo table. It could write them anywhere.
>>
>>
>>> Regarding timestamps, does the oracle server provide actual 
>>> timestamps or just logical timestamps?  That is, could I use the 
>>> timestamps that the server provides to define some sort of now() 
>>> function to obtain the current time to compare with the times of incoming events?
>>>
>>
>> Just logical time, and it delivers batches to limit locking, so it 
>> can appear to jump ahead spontaneously. I'm not sure the OracleServer 
>> is suitable for this purpose. What level of precision are you going for?
>> It might be enough to just run NTP, if you don't need more precision 
>> than "within seconds".
>>
>>
>>> ________________________________________
>>> From: Christopher <ctubbsii@apache.org>
>>> Sent: Tuesday, January 31, 2017 5:08 PM
>>> To: dev@fluo.incubator.apache.org
>>> Subject: Re: third party service to poll Fluo for absence of event
>>>
>>> You could write an observer which rolls up timestamps from all the 
>>> events you are concerned about, and puts the most recent event 
>>> timestamp into a centralized place, which you could poll. If there 
>>> is no ingest of these events, then the last timestamp in this 
>>> central place will exceed some threshold and the poller could detect that and
trigger additional actions.
>>>
>>> On Tue, Jan 31, 2017 at 3:51 PM Meier, Caleb 
>>> <Caleb.Meier@parsons.com>
>>> wrote:
>>>
>>> > Hello,
>>> >
>>> > I’m looking into using Fluo to develop an event based notification 
>>> > system that incrementally generates events of increasing 
>>> > complexity.  The one issue that I’m running into is how to handle 
>>> > the non-event event.  That
>>> is,
>>> > Fluo (as I understand it) is not well-suited to handle the 
>>> > following
>>> > request: “generate a notification if no events of a given type 
>>> > have occurred within the last 24 hours”.  This is because it is a 
>>> > push based notification framework that only generates 
>>> > notifications when things actually happen.  So the question is, 
>>> > has anyone looked into developing a service for generating 
>>> > notifications at regular intervals (even if something doesn’t happen)
that works with Fluo?
>>> > I’m toying with the idea of creating some sort of Twill 
>>> > application that tells Fluo to wake up at regular intervals to 
>>> > generate a notification about the set of events falling within the 
>>> > given time window. Before doing this I just wanted to make sure 
>>> > that something like this does not already exist, and I also
>>> want
>>> > to get a sense of how bad an idea it is to delegate some of the logic of
>>> > this periodic notification service to Fluo.   Would it be better to
>>> > separate out the temporal portion of my notification request to be 
>>> > processed entirely outside of Fluo to avoid transactional overhead?
>>> >
>>> > Caleb A. Meier, Ph.D.
>>> > Software Engineer II ♦ Analyst
>>> > Parsons Corporation
>>> > 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>>> > Office:  (703)797-3066 <(703)%20797-3066> <(703)%20797-3066>

>>> > Caleb.Meier@Parsons.com<mailto:Caleb.Meier@Parsons.com> ♦
>>> www.parsons.com<
>>> > http://www.parsons.com/>
>>> >
>>> > --
>>> Christopher
>>>
>> --
>> Christopher
Mime
View raw message