fluo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: fluo accumulo table tablet servers not keeping up with application
Date Thu, 26 Oct 2017 18:07:16 GMT
On Thu, Oct 26, 2017 at 11:34 AM, Meier, Caleb <Caleb.Meier@parsons.com> wrote:
> Hello Fluo Devs,
> We have implemented an incremental query evaluation service for Apache Rya that leverages
Apache Fluo.  We’ve been doing some benchmarking and we’ve found that the Accumulo Tablet
servers for the Fluo table are falling behind pretty quickly for our application.  We’ve
tried splitting the Accumulo Table so that we have more Tablet Servers, but that doesn’t
really buy us too much.  Our application is fairly scan intensive—we have a metadata framework
in place that allows us to pass query results through the query tree, and each observer needs
to look up metadata to determine which observer to route its data to after processing.  To
give you some indication of our scan rates, our Join Observer does about 6 lookups, builds
a scanner to do one RowColumn restricted scan, and then does many writes.  So an obvious way
to alleviate the burden on the TableServer is to cut down on the number of scans.
> One approach that we are considering is to import all of our metadata into memory.  Essentially,
each Observer would need access to an in memory metadata cache.  We’re considering using
the Observer context, but this cache needs to be mutable because a user needs to be able to
register new queries.  Is it possible to update the context, or would we need to restart the
application to do that?  I guess other options would be to create a static cache for each
Observer that stores the metadata, or to store it in Zookeeper.  Have any of you devs ever
had create a solution to share state between Observers that doesn’t rely on the Fluo table?

If you did want to cache something between observers this would
require using static stuff in 1.0.  In 1.1.0 Fluo introduced a new API
for creating observers called the ObserverProvider.  Using this new
API, static stuff would not be required.  The cache could be created
in the ObserverProvider and passed to the Observers.  The 1.1.0
release notes give an overview of the new API.


> In addition to cutting down on the scan rate, are there any other approaches that you
would consider?  I assume that the problem lies primarily with how we’ve implemented our
application, but I’m also wondering if there is anything we can do from a configuration
point of view to reduce the burden on the Tablet servers.  Would reducing the number of workers/worker
threads to cut down on the number of times a single observation is processed be helpful? 
It seems like this approach would cut out some redundant scans as well, but it might be more
of a second order optimization. In general, any insight that you might have on this problem
would be greatly appreciated.
> Sincerely,
> Caleb Meier
> Caleb A. Meier, Ph.D.
> Senior Software Engineer ♦ Analyst
> Parsons Corporation
> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> Office:  (703)797-3066
> Caleb.Meier@Parsons.com<mailto:Caleb.Meier@Parsons.com> ♦ www.parsons.com<https://webportal.parsons.com/,DanaInfo=www.parsons.com+>

View raw message