fluo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Successful Stress Test Run
Date Thu, 11 Jan 2018 15:19:34 GMT
On Wed, Jan 10, 2018 at 2:57 PM, Alan Camillo <alan@blueshift.com.br> wrote:
> Keith, what do you think about the throughput archived?
> Was it around 15k messages per second, right?

Yeah around 15K transactions per second.

> *Dummy questions:*
> I've noticed when I increase the number of process (threads/applications)
> loading data better is the throughput. (Obviously)
> But I didn't reach the maximum of the Fluo. Is MapRedure the best way to
> load data to fluo?

The Stress test uses map reduce as an easy way to run Fluo loaders
that read data from HDFS on many nodes.

> Are there differences in use "fluo exec" to start a Fluo client instead use
> a "java -jar" or other ways?

Fluo exec sets up the classpath with Fluo and application jars.  It
can also inject the Fluo configuration.

> Are there other clients to load/transact with Fluo? Python, Go..

Not at the moment.  I have wanted to experiment with writing Fluo code
using Kotlin and Jython but have not had the time.
> What's the difference in use a Loader or a client transaction?

The advantage of the Loader is that it asynchronously commits many
transactions in batches.  So the loader could possibly have 10K+
transactions committing.

When not using the loader the number of committing transactions is
limited by the number of threads.  When using the loader the number of
committing transactions is limited by what will fit in memory (there
is a configurable limit for how many transactions will be buffered in
memory, it defaults to 20M).

> Asynchronous/Synchronous?
> *(sorry for the disconnected questions!)*
> Thanks!
> Alan Camillo
> *BlueShift *I IT Director
> Cel.: +55 11 98283-6358
> Tel.: +55 11 4605-5082
> 2018-01-10 13:19 GMT-02:00 Keith Turner <keith@deenlo.com>:
>> I completed a successful 24hr run of the Fluo stress test on a 10 node
>> EC2 cluster.  For the test 1 billion random integers were loaded via
>> map reduce and then 370 million were loaded by Fluo.  This resulted in
>> ~1.3 billion transaction executing and ~13 million collisions.  Fluo
>> commit dbad51d was used for the test.  Below is the final output from
>> the test.
>> *****Verifying Fluo & MapReduce results match*****
>> Success! Fluo & MapReduce both calculated 1369064132 unique integers
>> During the test CPU utilization was not uniformly high.  Looking at
>> the Accumulo monitor some nodes would have lots of queued scans.
>> Running jstack on that nodes showed lots of threads trying to reserve
>> open files.  However there were only a few threads actually running
>> scans.  This seemed very odd and I plan to investigate further.  I had
>> set the max open files to 1000 and all tablets had only 3 to 4 files.
>> Therefore if 1000 files were reserved I would have expected to see
>> lots of scans running, however this was not what I saw.
>> Below is a gist with info about config used for the test.
>> https://gist.github.com/keith-turner/e28ee6cd4941210f34e5cd0e6a6b3106

View raw message