cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laing, Michael" <michael.la...@nytimes.com>
Subject Re: High latencies for simple queries
Date Fri, 27 Mar 2015 21:10:50 GMT
Actually I am in the middle of setting up the same sort of thing for
PostgreSQL using psycopg2 and pyev.

I'll be using Cassandra and PostgreSQL in an IoT experiment as the backend
for swarms of MQTT brokers at something in the 10-100M client range.

ml

On Fri, Mar 27, 2015 at 4:59 PM, Laing, Michael <michael.laing@nytimes.com>
wrote:

> I use callback chaining with the python driver and can confirm that it is
> very fast.
>
> You can "chain the chains" together to perform sequential processing. I do
> this when retrieving "metadata" and then the referenced "payload" for
> example, when the metadata has been inverted and the payload is larger than
> we want to invert. And you can be running multiple "chains of chains"
> asynchronously - cascade state by employing the userdata of the future.
>
> We also multiprocess, for more parallelism, and we distribute work to
> multiple multiprocessing instances using a message broker for yet more
> parallel activity, as well as reliability.
>
> ml
>
> On Fri, Mar 27, 2015 at 4:28 PM, Tyler Hobbs <tyler@datastax.com> wrote:
>
>> Since you're executing queries sequentially, you may want to look into
>> using callback chaining to avoid the cross-thread signaling that results in
>> the 1ms latencies.  Basically, just use session.execute_async() and attach
>> a callback to the returned future that will execute your next query.  The
>> callback is executed on the event loop thread.  The main downsides to this
>> are that you need to be careful to avoid blocking the event loop thread
>> (including executing session.execute() or prepare()) and you need to ensure
>> that all exceptions raised in the callback are handled by your application
>> code.
>>
>> On Fri, Mar 27, 2015 at 3:11 PM, Artur Siekielski <artc@vhex.net> wrote:
>>
>>> I think that in your example Postgres spends most time on waiting for
>>> fsync() to complete. On Linux, for a battery-backed raid controller, it's
>>> safe to mount ext4 filesystem with "barrier=0" option which improves
>>> fsync() performance a lot. I have partitions mounted with this option and I
>>> did a test from Python, using psycopg2 driver, and I got the following
>>> latencies, in milliseconds:
>>> - INSERT without COMMIT: 0.04
>>> - INSERT with COMMIT: 0.12
>>> - SELECT: 0.05
>>> I'm also repeating benchmark runs multiple times (I'm using Python's
>>> "timeit" module).
>>>
>>>
>>> On 03/27/2015 07:58 PM, Ben Bromhead wrote:
>>>
>>>> Latency can be so variable even when testing things locally. I quickly
>>>> fired up postgres and did the following with psql:
>>>>
>>>> ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
>>>> CREATE TABLE
>>>> ben=# \timing
>>>> Timing is on.
>>>> ben=# INSERT INTO foo VALUES(2, 'yay');
>>>> INSERT 0 1
>>>> Time: 1.162 ms
>>>> ben=# INSERT INTO foo VALUES(3, 'yay');
>>>> INSERT 0 1
>>>> Time: 1.108 ms
>>>>
>>>> I then fired up a local copy of Cassandra (2.0.12)
>>>>
>>>> cqlsh> CREATE KEYSPACE foo WITH replication = { 'class' :
>>>> 'SimpleStrategy', 'replication_factor' : 1 };
>>>> cqlsh> USE foo;
>>>> cqlsh:foo> CREATE TABLE foo(i int PRIMARY KEY, j text);
>>>> cqlsh:foo> TRACING ON;
>>>> Now tracing requests.
>>>> cqlsh:foo> INSERT INTO foo (i, j) VALUES (1, 'yay');
>>>>
>>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>
>

Mime
View raw message