incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: High disk I/O during reads
Date Sun, 24 Mar 2013 17:27:55 GMT
> Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
> xvdap1            0.13         0.00         1.07          0         16
> xvdb            474.20     13524.53        25.33     202868        380
> xvdc            469.87     13455.73        30.40     201836        456
Perchance are you running on m1.large instances ? 
You may be seeing the "moderate" IO performance they offer. 
 
> The query is looking for a maximum of 50 messages between two dates, in reverse order.
 
This is probably the slowest query you can do. Removing the reverse would make it faster,
or using a reversed comparator. 
You mentioned that most queries in cfhistograms were hitting one SSTable. If you are comfortable
with the output from that I would work out the distribution for read latency, the read_latency
in cfstats if the most recent only. 

It have a lot of deletes on a row that can slow down reads as well. Does not sound like the
case but I wanted to mention in. 

Next thing to look at is the proxyhistograms, that will give you the end-to-end request latency
in cassandra. Which will help with identifying if the issue is the disk read or network /
something else. The something else may be DigestMismatch errors if you have been dropping
writes and not running repair / disable hints. 

Cheers

 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 2:59 PM, Matt Kap <matvey1414@gmail.com> wrote:

> Having battled similar issues with read latency recently, here's some
> general things to look out for.
> 
> - At 118ms, something is definitely broken. You should be looking at
> under 10ms or lower, depending on hardware.
> - Do "nodetool info" on all 5 nodes. Is the load distributed evenly?
> Is it reasonable (under 500GB)?
> - Make sure you aren't running low on heap space. You could see that
> from "nodetool info" also. If you are running low, very bad things
> begin to happen (lots of GC, constant flushing of Memtables, reduction
> of Key Cache, etc). Generally, once there, the node doesn't recover,
> and read latency goes to sh*t.
> - Which compaction strategy are you using? Leveled compactions or
> size-tiered? There's different issues with both.
> - Is your Key Cache turned on? What's the Key Cache hit rate?
> - Is the Read Latency the same on all nodes? Or just one in particular?
> - Are pending compactions building up?
> - What's %util on disk? Same on all nodes?
> 
> I would go through "nodetool cfstats, info, compactionstats, tpstats",
> and see if things are roughly the same across all the nodes. You could
> also just be under capacity, but more likely, there's an actual
> problem looming somewhere.
> 
> Cheers!
> -Matt
> 
> On Sat, Mar 23, 2013 at 3:18 AM,  <igor@4friends.od.ua> wrote:
>> You can try to disable readahead on cassandra data disk.
>> 
>> Jon Scarborough <jon@fifth-aeon.net> написал(а):
>>> 
>>> Checked tpstats, there are very few dropped messages.
>>> 
>>> Checked histograms. Mostly nothing surprising. The vast majority of rows
>>> are small, and most reads only access one or two SSTables.
>>> 
>>> What I did discover is that of our 5 nodes, one is performing well, with
>>> disk I/O in the ballprk that seems reasonable. The other 4 nodes are doing
>>> roughly 4x the disk i/O per second.  Interestingly, the node that is
>>> performing well also seems to be servicing about twice the number of reads
>>> that the other nodes are.
>>> 
>>> I compared configuration between the node performing well to those that
>>> aren't, and so far haven't found any discrepancies.
>>> 
>>> On Fri, Mar 22, 2013 at 10:43 AM, Wei Zhu <wz1975@yahoo.com> wrote:
>>>> 
>>>> According to your cfstats, read latency is over 100 ms which is really
>>>> really slow. I am seeing less than 3ms reads for my cluster which is on SSD.
>>>> Can you also check the nodetool cfhistorgram, it tells you more about the
>>>> number of SSTable involved and read/write latency. Somtimes average doesn't
>>>> tell you the whole storey.
>>>> Also check your nodetool tpstats, are there a lot dropped reads?
>>>> 
>>>> -Wei
>>>> ----- Original Message -----
>>>> From: "Jon Scarborough" <jon@fifth-aeon.net>
>>>> To: user@cassandra.apache.org
>>>> Sent: Friday, March 22, 2013 9:42:34 AM
>>>> Subject: Re: High disk I/O during reads
>>>> 
>>>> Key distribution across probably varies a lot from row to row in our
>>>> case. Most reads would probably only need to look at a few SSTables, a few
>>>> might need to look at more.
>>>> 
>>>> I don't yet have a deep understanding of C* internals, but I would
>>>> imagine even the more expensive use cases would involve something like this:
>>>> 
>>>> 1) Check the index for each SSTable to determine if part of the row is
>>>> there.
>>>> 2) Look at the endpoints of the slice to determine if the data in a
>>>> particular SSTable is relevant to the query.
>>>> 3) Read the chunks of those SSTables, working backwards from the end of
>>>> the slice until enough columns have been read to satisfy the limit clause
in
>>>> the query.
>>>> 
>>>> So I would have guessed that even the more expensive queries on wide rows
>>>> typically wouldn't need to read more than a few hundred KB from disk to do
>>>> all that. Seems like I'm missing something major.
>>>> 
>>>> Here's the complete CF definition, including compression settings:
>>>> 
>>>> CREATE COLUMNFAMILY conversation_text_message (
>>>> conversation_key bigint PRIMARY KEY
>>>> ) WITH
>>>> comment='' AND
>>>> 
>>>> comparator='CompositeType(org.apache.cassandra.db.marshal.DateType,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.AsciiType,org.apache.cassandra.db.marshal.AsciiType)'
>>>> AND
>>>> read_repair_chance=0.100000 AND
>>>> gc_grace_seconds=864000 AND
>>>> default_validation=text AND
>>>> min_compaction_threshold=4 AND
>>>> max_compaction_threshold=32 AND
>>>> replicate_on_write=True AND
>>>> compaction_strategy_class='SizeTieredCompactionStrategy' AND
>>>> 
>>>> compression_parameters:sstable_compression='org.apache.cassandra.io.compress.SnappyCompressor';
>>>> 
>>>> Much thanks for any additional ideas.
>>>> 
>>>> -Jon
>>>> 
>>>> 
>>>> 
>>>> On Fri, Mar 22, 2013 at 8:15 AM, Hiller, Dean < Dean.Hiller@nrel.gov >
>>>> wrote:
>>>> 
>>>> 
>>>> Did you mean to ask "are 'all' your keys spread across all SSTables"? I
>>>> am guessing at your intention.
>>>> 
>>>> I mean I would very well hope my keys are spread across all sstables or
>>>> otherwise that sstable should not be there as he has no keys in it ;).
>>>> 
>>>> And I know we had HUGE disk size from the duplication in our sstables on
>>>> size-tiered compaction....we never ran a major compaction but after we
>>>> switched to LCS, we went from 300G to some 120G or something like that which
>>>> was nice. We only have 300 data point posts / second so not an extreme write
>>>> load on 6 nodes as well though these posts causes read to check
>>>> authorization and such of our system.
>>>> 
>>>> Dean
>>>> 
>>>> From: Kanwar Sangha < kanwar@mavenir.com <mailto: kanwar@mavenir.com
>>
>>>> Reply-To: " user@cassandra.apache.org <mailto: user@cassandra.apache.org
>>>>> " < user@cassandra.apache.org <mailto: user@cassandra.apache.org
>>
>>>> Date: Friday, March 22, 2013 8:38 AM
>>>> To: " user@cassandra.apache.org <mailto: user@cassandra.apache.org >"
<
>>>> user@cassandra.apache.org <mailto: user@cassandra.apache.org >>
>>>> Subject: RE: High disk I/O during reads
>>>> 
>>>> 
>>>> Are your Keys spread across all SSTables ? That will cause every sstable
>>>> read which will increase the I/O.
>>>> 
>>>> What compaction are you using ?
>>>> 
>>>> From: zodiak@fifth-aeon.net <mailto: zodiak@fifth-aeon.net > [mailto:
>>>> zodiak@fifth-aeon.net ] On Behalf Of Jon Scarborough
>>>> 
>>>> Sent: 21 March 2013 23:00
>>>> To: user@cassandra.apache.org <mailto: user@cassandra.apache.org >
>>>> 
>>>> 
>>>> Subject: High disk I/O during reads
>>>> 
>>>> Hello,
>>>> 
>>>> We've had a 5-node C* cluster (version 1.1.0) running for several months.
>>>> Up until now we've mostly been writing data, but now we're starting to
>>>> service more read traffic. We're seeing far more disk I/O to service these
>>>> reads than I would have anticipated.
>>>> 
>>>> The CF being queried consists of chat messages. Each row represents a
>>>> conversation between two people. Each column represents a message. The
>>>> column key is composite, consisting of the message date and a few other bits
>>>> of information. The CF is using compression.
>>>> 
>>>> The query is looking for a maximum of 50 messages between two dates, in
>>>> reverse order. Usually the two dates used as endpoints are 30 days ago and
>>>> the current time. The query in Astyanax looks like this:
>>>> 
>>>> ColumnList<ConversationTextMessageKey> result =
>>>> keyspace.prepareQuery(CF_CONVERSATION_TEXT_MESSAGE)
>>>> .setConsistencyLevel(ConsistencyLevel.CL_QUORUM)
>>>> .getKey(conversationKey)
>>>> .withColumnRange(
>>>> textMessageSerializer.makeEndpoint(endDate,
>>>> Equality.LESS_THAN).toBytes(),
>>>> textMessageSerializer.makeEndpoint(startDate,
>>>> Equality.GREATER_THAN_EQUALS).toBytes(),
>>>> true,
>>>> maxMessages)
>>>> .execute()
>>>> .getResult();
>>>> 
>>>> We're currently servicing around 30 of these queries per second.
>>>> 
>>>> Here's what the cfstats for the CF look like:
>>>> 
>>>> Column Family: conversation_text_message
>>>> SSTable count: 15
>>>> Space used (live): 211762982685
>>>> Space used (total): 211762982685
>>>> Number of Keys (estimate): 330118528
>>>> Memtable Columns Count: 68063
>>>> Memtable Data Size: 53093938
>>>> Memtable Switch Count: 9743
>>>> Read Count: 4313344
>>>> Read Latency: 118.831 ms.
>>>> Write Count: 817876950
>>>> Write Latency: 0.023 ms.
>>>> Pending Tasks: 0
>>>> Bloom Filter False Postives: 6055
>>>> Bloom Filter False Ratio: 0.00260
>>>> Bloom Filter Space Used: 686266048
>>>> Compacted row minimum size: 87
>>>> Compacted row maximum size: 14530764
>>>> Compacted row mean size: 1186
>>>> 
>>>> On the C* nodes, iostat output like this is typical, and can spike to be
>>>> much worse:
>>>> 
>>>> avg-cpu: %user %nice %system %iowait %steal %idle
>>>> 1.91 0.00 2.08 30.66 0.50 64.84
>>>> 
>>>> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
>>>> xvdap1 0.13 0.00 1.07 0 16
>>>> xvdb 474.20 13524.53 25.33 202868 380
>>>> xvdc 469.87 13455.73 30.40 201836 456
>>>> md0 972.13 26980.27 55.73 404704 836
>>>> 
>>>> Any thoughts on what could be causing read I/O to the disk from these
>>>> queries?
>>>> 
>>>> Much thanks!
>>>> 
>>>> -Jon
>>>> 
>>>> 
>>> 
>> 
>> --
>> Отправлено через К-9 Mail. Извините за краткость,
пожалуйста.
> 
> 
> 
> -- 
> www.calcmachine.com - easy online calculator.


Mime
View raw message