fluo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Meier, Caleb" <Caleb.Me...@parsons.com>
Subject RE: fluo accumulo table tablet servers not keeping up with application
Date Fri, 27 Oct 2017 15:03:27 GMT
Hey Keith,

Our benchmark consists of a single query that is a join of two statement patterns (essentially
patterns that incoming data matches, where a unit of data is a statement).  We are ingesting
50 pairs of statements a minute (100 total), where each statement in the pair matches one
of the statement patterns.  Because the data is being ingested at a constant rate, the statement
pattern Observers and Join Observers are constantly working.  One thing that is worth mentioning
is that we changed the property fluo.implScanTask.maxSleep from 5 min to 10 seconds.  Based
on the constant ingest rate, your comments below, and our low maxSleep, it seems like the
workers would constantly be scanning for new notifications.  

> Once a worker scans all tablets and finds a list of notifications, it does not scan again
until half of those notifications are processed.

How does the maxSleep property work in conjunction with this?  If the max sleep time elapses
before a worker processes half of the notifications, will it scan?   

Caleb A. Meier, Ph.D.
Senior Software Engineer ♦ Analyst
Parsons Corporation
1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
Office:  (703)797-3066
Caleb.Meier@Parsons.com ♦ www.parsons.com

-----Original Message-----
From: Keith Turner [mailto:keith@deenlo.com] 
Sent: Thursday, October 26, 2017 6:20 PM
To: fluo-dev <dev@fluo.apache.org>
Subject: Re: fluo accumulo table tablet servers not keeping up with application

On Thu, Oct 26, 2017 at 5:47 PM, Meier, Caleb <Caleb.Meier@parsons.com> wrote:
> Hey Keith,
>
> We'll rerun the benchmarks tomorrow and track the outstanding notifications.  We'll also
see if compacting at some point during ingest helps with the scan rate.  Have you observed
such high scan rates for such a small amount of data in any of your benchmarking?  What would
account for the huge disparity in results read vs. results returned?  It seems like our scans
are extremely inefficient for some reason.  Our tablet servers are becoming overwhelmed even
before data gets flushed to disk.

Oh I never saw you attachment, may not be able to attach stuff on mailing list.

Its possible that what you are seeing is the workers scanning for notifications.  If you look
in the workers logs do you see messages about scanning for notifications?  If so what do they
look like?

In 1.0.0 each worker scans all tablets in random order.  When it scans it has an iterator
that uses hash+mod to select a subset of notifications.  The iterator also suppresses deleted
notifications.
So the selection and suppression by that iterator could explain the read vs returned.  It
does exponential back off on tablets where it does not find data.  Once a worker scans all
tablets and finds a list of notifications, it does not scan again until half of those notifications
are processed.

In the beginning, would you have a lot of notifications?  If so I would expect a lot of scanning
and then it should slow down once the workers get a list of notifications to process.

In 1.1.0 the workers divide up the tablets (so workers no longer scan
all tablets, groups of workers share groups of tablets).   If the
table is splits after the workers start, it may take them a bit to execute the distributed
algorithm that divys tablets among workers.

Anyway the debug messages about scanning for notifications in the workers should provide some
insight into this.

If its not notification scanning, then it could be that the application is scanning over a
lots of data that was deleted or something like that.

>
> Caleb A. Meier, Ph.D.
> Senior Software Engineer ♦ Analyst
> Parsons Corporation
> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> Office:  (703)797-3066
> Caleb.Meier@Parsons.com ♦ www.parsons.com
>
> -----Original Message-----
> From: Keith Turner [mailto:keith@deenlo.com]
> Sent: Thursday, October 26, 2017 5:36 PM
> To: fluo-dev <dev@fluo.apache.org>
> Subject: Re: fluo accumulo table tablet servers not keeping up with 
> application
>
> On Thu, Oct 26, 2017 at 2:50 PM, Meier, Caleb <Caleb.Meier@parsons.com> wrote:
>> Hey Keith,
>>
>> Thanks for the reply.  Regarding our benchmark, I've attached some screenshots of
our Accumulo UI that were taken while the benchmark was running.  Basically, our ingest rate
is pretty low (about 150 entries/s, but our scan rate is off the charts - approaching 6 million
entries/s!).  Also, notice the disparity between reads and returned in the Scan chart.  That
disparity would suggest that we're possibly doing full table scans somewhere, which is strange
given that all of our scans are RowColumn constrained.  Perhaps we are building our Scanner
incorrectly.   In an effort to maximize the number of TabletServers, we split the Fluo table
into 5MB tablets.  Also, the data is not well balanced -- the tablet servers do take turns
being maxed out while others are idle.  We're considering possible sharding strategies.
>>
>> Given that our TabletServers are getting saturated so quickly for such a low ingest
rate, it seems like we definitely need to cut down on the number of scans as a first line
of attack to see what that buys us.  Then we'll look into tuning Accumulo and Fluo.  Does
this seem like a reasonable approach to you?  Does the scan rate of our application strike
you as extremely high?  When you look at the Rya Observers, can you pay attention to how we
are building our scans to make sure that we're not inadvertently doing full table scans? 
Also, what exactly do you mean by "are the 6 lookups in the transaction done sequentially"?
>
> Regarding the scan rate there are few things I Am curious about.
>
> Fluo workers scan for notifications in addition to the scanning done 
> by your apps.  I made some changes in 1.1.0 to reduce the amount of 
> scanning needed to find notifications, but this should not make much 
> of a difference on a small amount of nodes.  Details about this are in
> 1.1.0 release notes.  I am not sure what the best way is to determine how much of the
scanning you are seeing is app vs notification finding.  Can you run the fluo wait command
to see how many outstanding notifications there are?
>
> Transactions leave a paper trail behind and compactions clean this up (Fluo has a garbage
collection iterator).  This is why I asked what effect compacting the table had.  Compactions
will also clean up deleted notifications.
>
>
>>
>> Thanks,
>> Caleb
>>
>> Caleb A. Meier, Ph.D.
>> Senior Software Engineer ♦ Analyst
>> Parsons Corporation
>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>> Office:  (703)797-3066
>> Caleb.Meier@Parsons.com ♦ www.parsons.com
>>
>> -----Original Message-----
>> From: Keith Turner [mailto:keith@deenlo.com]
>> Sent: Thursday, October 26, 2017 1:39 PM
>> To: fluo-dev <dev@fluo.apache.org>
>> Subject: Re: fluo accumulo table tablet servers not keeping up with 
>> application
>>
>> Caleb
>>
>> What if any tuning have you done?  The following tune-able Accumulo parameters impact
performance.
>>
>>  * Write ahead log sync settings (this can have huge performance
>> implications)
>>  * Files per tablet
>>  * Tablet server cache sizes
>>  * Accumulo data block sizes
>>  * Tablet server client thread pool size
>>
>> For Fluo the following tune-able parameters are important.
>>
>>  * Commit memory (this determines how many transactions are held in 
>> memory while committing)
>>  * Threads running transactions
>>
>> What does the load (CPU and memory) on the cluster look like?  I'm curious how even
it is?  For example is one tserver at 100% cpu while others are idle, this could be caused
by uneven data access patterns.
>>
>> Would it be possible for me to see or run the benchmark?  I am going to take a look
at the Rya observers, let me know if there is anything in particular I should look at.
>>
>> Are the 6 lookups in the transaction done sequentially?
>>
>> Keith
>>
>> On Thu, Oct 26, 2017 at 11:34 AM, Meier, Caleb <Caleb.Meier@parsons.com> wrote:
>>> Hello Fluo Devs,
>>>
>>> We have implemented an incremental query evaluation service for Apache Rya that
leverages Apache Fluo.  We’ve been doing some benchmarking and we’ve found that the Accumulo
Tablet servers for the Fluo table are falling behind pretty quickly for our application. 
We’ve tried splitting the Accumulo Table so that we have more Tablet Servers, but that doesn’t
really buy us too much.  Our application is fairly scan intensive—we have a metadata framework
in place that allows us to pass query results through the query tree, and each observer needs
to look up metadata to determine which observer to route its data to after processing.  To
give you some indication of our scan rates, our Join Observer does about 6 lookups, builds
a scanner to do one RowColumn restricted scan, and then does many writes.  So an obvious way
to alleviate the burden on the TableServer is to cut down on the number of scans.
>>>
>>> One approach that we are considering is to import all of our metadata into memory.
 Essentially, each Observer would need access to an in memory metadata cache.  We’re considering
using the Observer context, but this cache needs to be mutable because a user needs to be
able to register new queries.  Is it possible to update the context, or would we need to restart
the application to do that?  I guess other options would be to create a static cache for each
Observer that stores the metadata, or to store it in Zookeeper.  Have any of you devs ever
had create a solution to share state between Observers that doesn’t rely on the Fluo table?
>>>
>>> In addition to cutting down on the scan rate, are there any other approaches
that you would consider?  I assume that the problem lies primarily with how we’ve implemented
our application, but I’m also wondering if there is anything we can do from a configuration
point of view to reduce the burden on the Tablet servers.  Would reducing the number of workers/worker
threads to cut down on the number of times a single observation is processed be helpful? 
It seems like this approach would cut out some redundant scans as well, but it might be more
of a second order optimization. In general, any insight that you might have on this problem
would be greatly appreciated.
>>>
>>> Sincerely,
>>> Caleb Meier
>>>
>>> Caleb A. Meier, Ph.D.
>>> Senior Software Engineer ♦ Analyst
>>> Parsons Corporation
>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>>> Office:  (703)797-3066
>>> Caleb.Meier@Parsons.com<mailto:Caleb.Meier@Parsons.com> ♦ 
>>> www.parsons.com<https://webportal.parsons.com/,DanaInfo=www.parsons.
>>> c
>>> om+>
>>>
Mime
View raw message