incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From i...@4friends.od.ua
Subject Re: High disk I/O during reads
Date Sat, 23 Mar 2013 07:18:29 GMT
You can try to disable readahead on cassandra data disk.

Jon Scarborough <jon@fifth-aeon.net> написал(а):

>Checked tpstats, there are very few dropped messages.
>
>Checked histograms. Mostly nothing surprising. The vast majority of
>rows
>are small, and most reads only access one or two SSTables.
>
>What I did discover is that of our 5 nodes, one is performing well,
>with
>disk I/O in the ballprk that seems reasonable. The other 4 nodes are
>doing
>roughly 4x the disk i/O per second.  Interestingly, the node that is
>performing well also seems to be servicing about twice the number of
>reads
>that the other nodes are.
>
>I compared configuration between the node performing well to those that
>aren't, and so far haven't found any discrepancies.
>
>On Fri, Mar 22, 2013 at 10:43 AM, Wei Zhu <wz1975@yahoo.com> wrote:
>
>> According to your cfstats, read latency is over 100 ms which is
>really
>> really slow. I am seeing less than 3ms reads for my cluster which is
>on
>> SSD. Can you also check the nodetool cfhistorgram, it tells you more
>about
>> the number of SSTable involved and read/write latency. Somtimes
>average
>> doesn't tell you the whole storey.
>> Also check your nodetool tpstats, are there a lot dropped reads?
>>
>> -Wei
>> ----- Original Message -----
>> From: "Jon Scarborough" <jon@fifth-aeon.net>
>> To: user@cassandra.apache.org
>> Sent: Friday, March 22, 2013 9:42:34 AM
>> Subject: Re: High disk I/O during reads
>>
>> Key distribution across probably varies a lot from row to row in our
>case.
>> Most reads would probably only need to look at a few SSTables, a few
>might
>> need to look at more.
>>
>> I don't yet have a deep understanding of C* internals, but I would
>imagine
>> even the more expensive use cases would involve something like this:
>>
>> 1) Check the index for each SSTable to determine if part of the row
>is
>> there.
>> 2) Look at the endpoints of the slice to determine if the data in a
>> particular SSTable is relevant to the query.
>> 3) Read the chunks of those SSTables, working backwards from the end
>of
>> the slice until enough columns have been read to satisfy the limit
>clause
>> in the query.
>>
>> So I would have guessed that even the more expensive queries on wide
>rows
>> typically wouldn't need to read more than a few hundred KB from disk
>to do
>> all that. Seems like I'm missing something major.
>>
>> Here's the complete CF definition, including compression settings:
>>
>> CREATE COLUMNFAMILY conversation_text_message (
>> conversation_key bigint PRIMARY KEY
>> ) WITH
>> comment='' AND
>>
>comparator='CompositeType(org.apache.cassandra.db.marshal.DateType,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.AsciiType,org.apache.cassandra.db.marshal.AsciiType)'
>> AND
>> read_repair_chance=0.100000 AND
>> gc_grace_seconds=864000 AND
>> default_validation=text AND
>> min_compaction_threshold=4 AND
>> max_compaction_threshold=32 AND
>> replicate_on_write=True AND
>> compaction_strategy_class='SizeTieredCompactionStrategy' AND
>>
>>
>compression_parameters:sstable_compression='org.apache.cassandra.io.compress.SnappyCompressor';
>>
>> Much thanks for any additional ideas.
>>
>> -Jon
>>
>>
>>
>> On Fri, Mar 22, 2013 at 8:15 AM, Hiller, Dean < Dean.Hiller@nrel.gov
>>
>> wrote:
>>
>>
>> Did you mean to ask "are 'all' your keys spread across all SSTables"?
>I am
>> guessing at your intention.
>>
>> I mean I would very well hope my keys are spread across all sstables
>or
>> otherwise that sstable should not be there as he has no keys in it
>;).
>>
>> And I know we had HUGE disk size from the duplication in our sstables
>on
>> size-tiered compaction….we never ran a major compaction but after we
>> switched to LCS, we went from 300G to some 120G or something like
>that
>> which was nice. We only have 300 data point posts / second so not an
>> extreme write load on 6 nodes as well though these posts causes read
>to
>> check authorization and such of our system.
>>
>> Dean
>>
>> From: Kanwar Sangha < kanwar@mavenir.com <mailto: kanwar@mavenir.com
>>>
>> Reply-To: " user@cassandra.apache.org <mailto:
>user@cassandra.apache.org>" <
>> user@cassandra.apache.org <mailto: user@cassandra.apache.org >>
>> Date: Friday, March 22, 2013 8:38 AM
>> To: " user@cassandra.apache.org <mailto: user@cassandra.apache.org >"
><
>> user@cassandra.apache.org <mailto: user@cassandra.apache.org >>
>> Subject: RE: High disk I/O during reads
>>
>>
>> Are your Keys spread across all SSTables ? That will cause every
>sstable
>> read which will increase the I/O.
>>
>> What compaction are you using ?
>>
>> From: zodiak@fifth-aeon.net <mailto: zodiak@fifth-aeon.net > [mailto:
>> zodiak@fifth-aeon.net ] On Behalf Of Jon Scarborough
>>
>> Sent: 21 March 2013 23:00
>> To: user@cassandra.apache.org <mailto: user@cassandra.apache.org >
>>
>>
>> Subject: High disk I/O during reads
>>
>> Hello,
>>
>> We've had a 5-node C* cluster (version 1.1.0) running for several
>months.
>> Up until now we've mostly been writing data, but now we're starting
>to
>> service more read traffic. We're seeing far more disk I/O to service
>these
>> reads than I would have anticipated.
>>
>> The CF being queried consists of chat messages. Each row represents a
>> conversation between two people. Each column represents a message.
>The
>> column key is composite, consisting of the message date and a few
>other
>> bits of information. The CF is using compression.
>>
>> The query is looking for a maximum of 50 messages between two dates,
>in
>> reverse order. Usually the two dates used as endpoints are 30 days
>ago and
>> the current time. The query in Astyanax looks like this:
>>
>> ColumnList<ConversationTextMessageKey> result =
>> keyspace.prepareQuery(CF_CONVERSATION_TEXT_MESSAGE)
>> .setConsistencyLevel(ConsistencyLevel.CL_QUORUM)
>> .getKey(conversationKey)
>> .withColumnRange(
>> textMessageSerializer.makeEndpoint(endDate,
>Equality.LESS_THAN).toBytes(),
>> textMessageSerializer.makeEndpoint(startDate,
>> Equality.GREATER_THAN_EQUALS).toBytes(),
>> true,
>> maxMessages)
>> .execute()
>> .getResult();
>>
>> We're currently servicing around 30 of these queries per second.
>>
>> Here's what the cfstats for the CF look like:
>>
>> Column Family: conversation_text_message
>> SSTable count: 15
>> Space used (live): 211762982685
>> Space used (total): 211762982685
>> Number of Keys (estimate): 330118528
>> Memtable Columns Count: 68063
>> Memtable Data Size: 53093938
>> Memtable Switch Count: 9743
>> Read Count: 4313344
>> Read Latency: 118.831 ms.
>> Write Count: 817876950
>> Write Latency: 0.023 ms.
>> Pending Tasks: 0
>> Bloom Filter False Postives: 6055
>> Bloom Filter False Ratio: 0.00260
>> Bloom Filter Space Used: 686266048
>> Compacted row minimum size: 87
>> Compacted row maximum size: 14530764
>> Compacted row mean size: 1186
>>
>> On the C* nodes, iostat output like this is typical, and can spike to
>be
>> much worse:
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 1.91 0.00 2.08 30.66 0.50 64.84
>>
>> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
>> xvdap1 0.13 0.00 1.07 0 16
>> xvdb 474.20 13524.53 25.33 202868 380
>> xvdc 469.87 13455.73 30.40 201836 456
>> md0 972.13 26980.27 55.73 404704 836
>>
>> Any thoughts on what could be causing read I/O to the disk from these
>> queries?
>>
>> Much thanks!
>>
>> -Jon
>>
>>
>>

-- 
Отправлено через К-9 Mail. Извините за краткость, пожалуйста.
Mime
View raw message