cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alprema <alpr...@alprema.com>
Subject Re: Read performance
Date Mon, 11 May 2015 11:52:00 GMT
According to the trace log, only one was read, the compaction strategy is
size tiered.

I attached a more readable version of my trace for details.

On Mon, May 11, 2015 at 11:35 AM, Anishek Agarwal <anishek@gmail.com> wrote:

> how many sst tables were there?   what compaction are you using ? These
> properties define how many possible disk reads cassandra has to do to get
> all the data you need depending on which SST Tables have data for your
> partition key.
>
> On Fri, May 8, 2015 at 6:25 PM, Alprema <alprema@alprema.com> wrote:
>
>> I was planning on using a more "server-friendly" strategy anyway (by
>> parallelizing my workload on multiple metrics) but my concern here is more
>> about the raw numbers.
>>
>> According to the trace and my estimation of the data size, the read from
>> disk was done at about 30MByte/s and the transfer between the responsible
>> node and the coordinator was done at 120Mbits/s which doesn't seem right
>> given that the cluster was not busy and the network is Gbit capable.
>>
>> I know that there is some overhead, but these numbers seem odd to me, do
>> they seem normal to you ?
>>
>> On Fri, May 8, 2015 at 2:34 PM, Bryan Holladay <holladay@longsight.com>
>> wrote:
>>
>>> Try breaking it up into smaller chunks using multiple threads and token
>>> ranges. 86400 is pretty large. I found ~1000 results per query is good.
>>> This will spread the burden across all servers a little more evenly.
>>>
>>> On Thu, May 7, 2015 at 4:27 AM, Alprema <alprema@alprema.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am writing an application that will periodically read big amounts of
>>>> data from Cassandra and I am experiencing odd performances.
>>>>
>>>> My column family is a classic time series one, with series ID and Day
>>>> as partition key and a timestamp as clustering key, the value being a
>>>> double.
>>>>
>>>> The query I run gets all the values for a given time series for a given
>>>> day (so about 86400 points):
>>>>
>>>> SELECT "UtcDate", "Value"FROM "Metric_OneSec"WHERE "MetricId" = 12215ece-6544-4fcf-a15d-4f9e9ce1567eAND
"Day" = '2015-05-05 00:00:00+0000'LIMIT 86400;
>>>>
>>>>
>>>> This takes about 450ms to run and when I trace the query I see that it
>>>> takes about 110ms to read the data from disk and 224ms to send the data
>>>> from the responsible node to the coordinator (full trace in attachment).
>>>>
>>>> I did a quick estimation of the requested data (correct me if I'm
>>>> wrong):
>>>> 86400 * (column name + column value + timestamp + ttl)
>>>> = 86400 * (8 + 8 + 8 + 8?)
>>>> = 2.6Mb
>>>>
>>>> Let's say about 3Mb with misc. overhead, so these timings seem pretty
>>>> slow to me for a modern SSD and a 1Gb/s NIC.
>>>>
>>>> Do those timings seem normal? Am I missing something?
>>>>
>>>> Thank you,
>>>>
>>>> Kévin
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message