cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alprema <alpr...@alprema.com>
Subject Re: Read performance
Date Fri, 08 May 2015 12:55:07 GMT
I was planning on using a more "server-friendly" strategy anyway (by
parallelizing my workload on multiple metrics) but my concern here is more
about the raw numbers.

According to the trace and my estimation of the data size, the read from
disk was done at about 30MByte/s and the transfer between the responsible
node and the coordinator was done at 120Mbits/s which doesn't seem right
given that the cluster was not busy and the network is Gbit capable.

I know that there is some overhead, but these numbers seem odd to me, do
they seem normal to you ?

On Fri, May 8, 2015 at 2:34 PM, Bryan Holladay <holladay@longsight.com>
wrote:

> Try breaking it up into smaller chunks using multiple threads and token
> ranges. 86400 is pretty large. I found ~1000 results per query is good.
> This will spread the burden across all servers a little more evenly.
>
> On Thu, May 7, 2015 at 4:27 AM, Alprema <alprema@alprema.com> wrote:
>
>> Hi,
>>
>> I am writing an application that will periodically read big amounts of
>> data from Cassandra and I am experiencing odd performances.
>>
>> My column family is a classic time series one, with series ID and Day as
>> partition key and a timestamp as clustering key, the value being a double.
>>
>> The query I run gets all the values for a given time series for a given
>> day (so about 86400 points):
>>
>> SELECT "UtcDate", "Value"FROM "Metric_OneSec"WHERE "MetricId" = 12215ece-6544-4fcf-a15d-4f9e9ce1567eAND
"Day" = '2015-05-05 00:00:00+0000'LIMIT 86400;
>>
>>
>> This takes about 450ms to run and when I trace the query I see that it
>> takes about 110ms to read the data from disk and 224ms to send the data
>> from the responsible node to the coordinator (full trace in attachment).
>>
>> I did a quick estimation of the requested data (correct me if I'm wrong):
>> 86400 * (column name + column value + timestamp + ttl)
>> = 86400 * (8 + 8 + 8 + 8?)
>> = 2.6Mb
>>
>> Let's say about 3Mb with misc. overhead, so these timings seem pretty
>> slow to me for a modern SSD and a 1Gb/s NIC.
>>
>> Do those timings seem normal? Am I missing something?
>>
>> Thank you,
>>
>> Kévin
>>
>>
>>
>

Mime
View raw message