incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Samuelsson <samuelsson.j...@gmail.com>
Subject Re: Cassandra freezes
Date Thu, 04 Apr 2013 11:49:15 GMT
Yes, both of those solutions seem fine.

My question still seems valid though; shouldn't the node recover and
perform as well as it did during the first few tests? If not, what makes
the node not have the same issues at a smaller load but after a longer
period of time? Having nodes' performance drop radically over time seems
unacceptable and not something most people experience.


2013/4/4 aaron morton <aaron@thelastpickle.com>

> INFO [ScheduledTasks:1] 2013-04-03 08:47:40,757 GCInspector.java (line
> 122) GC for ParNew: 40370 ms for 3 collections, 565045688 used; max is
> 2038693888
>
> This is the JVM pausing the application for 40 seconds to complete GC.
>
> You have two choices, use a bigger heap (4Gb to 8GB) or have a lower
> workload.
>
>  cheers
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3/04/2013, at 7:54 PM, Joel Samuelsson <samuelsson.joel@gmail.com>
> wrote:
>
> Thanks for your suggestions. I'll get back to you with the tests you
> suggested, probably tomorrow. In the meantime though I have a few
> questions. You say:
> - 2GB of JVM heap to be insufficient to run this workload against
> Cassandra
>
> I realise that the one node cluster has a maximum workload. It did however
> work fine for a few tests and then performance deteriorated. Currently I
> can't even complete a test run since the server won't respond in time -
> even though I haven't run a test since yesterday. Shouldn't the server
> "recover" sooner or later and perform as well as it did during the first
> few tests? If not automatically, what can I do to help it? Tried nodetool
> flush but with no performance improvement.
>
> And just fyi in case it changes anything, I don't immediately read back
> the written rows. There are 100 000 rows being written and 100 000 rows
> being read in parallell. The rows being read were written to the cluster
> before the tests were run.
>
>
> 2013/4/3 Andras Szerdahelyi <andras.szerdahelyi@ignitionone.com>
>
>>  Wrt/ cfhistograms output, you are supposed to consider "Offset" x
>> "column values of each column" a separate histogram. Also AFAIK, these
>> counters are flushed after each invocation, so you are always looking at
>> data from between two invocations of cfhistograms   - With that in mind, to
>> me your cfhistograms say :
>>  - you encountered 200k rows with a single column
>>
>>  - most of your write latencies are agreeable but – and I can not
>> comment on how much a commit log write ( an append ) would affect this as I
>> have durable_writes:false on all my data - that’s a long tail you have
>> there, in to the hundreds of milliseconds which can not be OK.
>>  Depending on how often your memtables are switched ( emptied and queued
>> for flushing to disk ) and how valuable your updates received in between
>> two of these are, you may want to disable durable writes on the KS with
>> "durable_writes=false", or the very least place the commit log folder on
>> its own drive. Again, I'm not absolutely sure this could affect write
>> latency
>>
>>  - 38162520 reads served with a single sstable read, that’s great
>>
>>  - a big chunk of these reads are served from page cache or memtables (
>> the latter being more likely since, as I understand , you immediately read
>> back the written column and you work with unique row keys ) , but again
>> you have a long drop off
>>
>>  16410 mutations / sec, with 1k payload, lets say that is 20MB/s in to
>> memory with overhead, 3rd of the 2GB heap for memtables = 666MB : a switch
>> every ~30 seconds.
>> I'm not sure if your write performance can be attributed to GC only, so
>> can you sit through one of these tests with :
>>
>>  watch -n2 iostat –xmd devicename ( look for avg–qusz and what your peak
>> write throughput is )
>> watch -n2 nodetool tpstats ( see if you have flushes blocked. )
>>
>>  In the end I think you'll either find
>>  - 2GB of JVM heap to be insufficient to run this workload against
>> Cassandra
>>  - or your single disk serving your data directory being unable to keep
>> up with having to flush 20MB/s sustained write every 30 seconds ( unlikely
>> unless you are on EC2 EBS drives )
>>
>>  Also, just be sure: restart cassandra before the test and confirm your
>> benchmark application is doing what you think its doing in terms of
>> reads/writes with nodetool cfstats
>>
>>  Regards,
>>  Andras
>>
>>   From: Joel Samuelsson <samuelsson.joel@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Date: Wednesday 3 April 2013 11:55
>>
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: Cassandra freezes
>>
>>   It seems this problem is back and I am unsure how to solve it. I have
>> a test setup like this:
>> 4 machines run 8 processess each. Each process has 2 threads, 1 for
>> writing 100 000 rows and one for reading another 100 000 rows. Each machine
>> (and process) read and write the exact same rows so it is essentially the
>> same 200 000 rows being read / written.
>> The cassandra cluster is a one node cluster.
>> The first 10-20 runs of the test described above goes smoothly, after
>> that tests take increasingly long time with GC happening almost all the
>> time.
>>
>>  Here is my CASS-FREEZE-001 form answers:
>>
>>  How big is your JVM heap ?
>> 2GB
>>
>>  How many CPUs ?
>> A virtual environment so I can't be perfectly sure but according to their
>> specification, "8 cores".
>>
>> Garbage collection taking long ? ( look for log lines from GCInspector)
>> Yes, these are a few lines seen during 1 test run:
>> INFO [ScheduledTasks:1] 2013-04-03 08:47:40,757 GCInspector.java (line
>> 122) GC for ParNew: 40370 ms for 3 collections, 565045688 used; max is
>> 2038693888
>>  INFO [ScheduledTasks:1] 2013-04-03 08:48:24,720 GCInspector.java (line
>> 122) GC for ParNew: 39840 ms for 2 collections, 614065528 used; max is
>> 2038693888
>> INFO [ScheduledTasks:1] 2013-04-03 08:49:09,319 GCInspector.java (line
>> 122) GC for ParNew: 37666 ms for 2 collections, 682352952 used; max is
>> 2038693888
>>  INFO [ScheduledTasks:1] 2013-04-03 08:50:02,577 GCInspector.java (line
>> 122) GC for ParNew: 44590 ms for 1 collections, 792861352 used; max is
>> 2038693888
>>
>>
>>  Running out of heap ? ( "heap is .. full" log lines )
>> Yes. Same run as above:
>> WARN [ScheduledTasks:1] 2013-04-03 08:54:35,108 GCInspector.java (line
>> 139) Heap is 0.8596674853032178 full.  You may need to reduce memtable
>> and/or cache sizes.  Cassandra is now reducing cache sizes to free up
>> memory.  Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you
>> don't want Cassandra to do this automatically
>>  WARN [ScheduledTasks:1] 2013-04-03 08:54:36,831 GCInspector.java (line
>> 145) Heap is 0.8596674853032178 full.  You may need to reduce memtable
>> and/or cache sizes.  Cassandra will now flush up to the two largest
>> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
>> in cassandra.yaml if you don't want Cassandra to do this automatically
>>
>>
>> Any tasks backing up / being dropped ? ( nodetool tpstats and ".. dropped
>> in last .. ms" log lines )
>> Yes. Same run as above:
>> INFO [ScheduledTasks:1] 2013-04-03 08:52:04,943 MessagingService.java
>> (line 673) 31 MUTATION messages dropped in last 5000ms
>>  INFO [ScheduledTasks:1] 2013-04-03 08:52:04,944 MessagingService.java
>> (line 673) 8 READ messages dropped in last 5000ms
>>
>> Are writes really slow? ( nodetool cfhistograms Keyspace ColumnFamily )
>> Not sure how to interpret the output of nodetool cfhistograms, but here
>> it is (I hope it's fairly readable):
>>  Offset      SSTables     Write Latency      Read Latency          Row
>> Size      Column Count
>> 1           38162520                 0                 0
>> 0            200000
>> 2                  0                22                 0
>> 0                 0
>> 3                  0              1629                 0
>> 0                 0
>> 4                  0              9990                 0
>> 0                 0
>> 5                  0             40169                 0
>> 0                 0
>> 6                  0            161538                 0
>> 0                 0
>> 7                  0            487266                 0
>> 0                 0
>> 8                  0           1096601                 0
>> 0                 0
>> 10                 0           4842978                 0
>> 0                 0
>> 12                 0           7976003                 0
>> 0                 0
>> 14                 0           8673230                 0
>> 0                 0
>> 17                 0           9805730                 0
>> 0                 0
>> 20                 0           5083707                 0
>> 0                 0
>> 24                 0           2541157                 0
>> 0                 0
>> 29                 0            768916                 0
>> 0                 0
>> 35                 0            220440                 0
>> 0                 0
>> 42                 0            112915                 0
>> 0                 0
>> 50                 0             71469                 0
>> 0                 0
>> 60                 0             48909                 0
>> 0                 0
>> 72                 0             50714                 0
>> 0                 0
>> 86                 0             45390                 0
>> 0                 0
>> 103                0             41975                 0
>> 0                 0
>> 124                0             40371                 0
>> 0                 0
>> 149                0             37103                 0
>> 0                 0
>> 179                0             44631                 0
>> 0                 0
>> 215                0             43957                 0
>> 0                 0
>> 258                0             32499                 1
>> 0                 0
>> 310                0             18446          23056779
>> 0                 0
>> 372                0             13113          12580639
>> 0                 0
>> 446                0              9862           1017347
>> 0                 0
>> 535                0              7480            784506
>> 0                 0
>> 642                0              5473            274274
>> 0                 0
>> 770                0              4084             56379
>> 0                 0
>> 924                0              3046             27979
>> 0                 0
>> 1109               0              2205             20206
>>  200000                 0
>> 1331               0              1658             16947
>> 0                 0
>> 1597               0              1228             16969
>> 0                 0
>> 1916               0               896             15848
>> 0                 0
>> 2299               0               542             13928
>> 0                 0
>> 2759               0               379             11782
>> 0                 0
>> 3311               0               326              9761
>> 0                 0
>> 3973               0               540              8997
>> 0                 0
>> 4768               0               450              7938
>> 0                 0
>> 5722               0               270              6552
>> 0                 0
>> 6866               0               170              6022
>> 0                 0
>> 8239               0               146              6474
>> 0                 0
>> 9887               0               166              7969
>> 0                 0
>> 11864              0               176             53725
>> 0                 0
>> 14237              0               203             10260
>> 0                 0
>> 17084              0               255              6827
>> 0                 0
>> 20501              0               312             27462
>> 0                 0
>> 24601              0               445             11523
>> 0                 0
>> 29521              0               736              9904
>> 0                 0
>> 35425              0               909             20539
>> 0                 0
>> 42510              0               896             14280
>> 0                 0
>> 51012              0               904             12443
>> 0                 0
>> 61214              0               715             11956
>> 0                 0
>> 73457              0               652             10040
>> 0                 0
>> 88148              0               474              7992
>> 0                 0
>> 105778             0               256              5043
>> 0                 0
>> 126934             0               113              2370
>> 0                 0
>> 152321             0                75              1189
>> 0                 0
>> 182785             0                39               690
>> 0                 0
>> 219342             0                44               550
>> 0                 0
>> 263210             0                69               551
>> 0                 0
>> 315852             0                35               419
>> 0                 0
>> 379022             0                35               564
>> 0                 0
>> 454826             0                52               504
>> 0                 0
>> 545791             0                79               749
>> 0                 0
>> 654949             0                61               737
>> 0                 0
>> 785939             0                30               399
>> 0                 0
>> 943127             0                57               611
>> 0                 0
>> 1131752            0                78               488
>> 0                 0
>> 1358102            0                23               302
>> 0                 0
>> 1629722            0                28               240
>> 0                 0
>> 1955666            0                48               294
>> 0                 0
>> 2346799            0                28               306
>> 0                 0
>> 2816159            0                19               224
>> 0                 0
>> 3379391            0                37               212
>> 0                 0
>> 4055269            0                24               237
>> 0                 0
>> 4866323            0                13               137
>> 0                 0
>> 5839588            0                11                99
>> 0                 0
>> 7007506            0                 4               115
>> 0                 0
>> 8409007            0                16               194
>> 0                 0
>> 10090808           0                12               156
>> 0                 0
>> 12108970           0                12                54
>> 0                 0
>> 14530764           0                24               147
>> 0                 0
>> 17436917           0                10               114
>> 0                 0
>> 20924300           0                 3                66
>> 0                 0
>> 25109160           0                22               100
>> 0                 0+
>>
>>  Some of the write latencies looks really bad, but since they have
>> column count 0 for most, I am not sure what to make of it.
>>
>>
>>
>> How much is lots of data?
>> Lots of data might have been an exaggeration but the test is as
>> described above. Each row read or written is about 1kb in size so each test
>> run generates 4 (machines) * 8 (processes) * 2 (read and write) * 100 000
>> (rows) * 1kb (row size) = 6250 mb read or written (half of each)
>>
>>  Wide or skinny rows?
>> Skinny rows, only a single column is used for each row.
>>
>>  Mutations/sec ?
>> The test when run on a freshly rebooted cassandra takes around 390
>> seconds, and 6400000 rows are read / written during this time period so
>> around 16410 mutations / second (unless I missunderstood the word mutation).
>>
>> Which Compaction Strategy are you using?
>> SizeTieredCompactionStrategy
>>
>>  Output of show schema (
>> cassandra-cli ) for the relevant Keyspace/CF might help as well
>>  create column family rawData
>>   with column_type = 'Standard'
>>   and comparator = 'UTF8Type'
>>   and default_validation_class = 'BytesType'
>>   and key_validation_class = 'BytesType'
>>   and read_repair_chance = 0.1
>>   and dclocal_read_repair_chance = 0.0
>>   and gc_grace = 864000
>>   and min_compaction_threshold = 4
>>   and max_compaction_threshold = 32
>>   and replicate_on_write = true
>>   and compaction_strategy =
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>>   and caching = 'KEYS_ONLY'
>>   and column_metadata = [
>>     {column_name : 'created',
>>     validation_class : LongType},
>>     {column_name : 'socketHash',
>>     validation_class : Int32Type},
>>     {column_name : 'data',
>>     validation_class : UTF8Type},
>>     {column_name : 'guid',
>>     validation_class : UTF8Type},
>>     {column_name : 'evaluated',
>>     validation_class : Int32Type,
>>     index_name : 'rawData_evaluated_idx_1',
>>     index_type : 0}]
>>   and compression_options = {'sstable_compression' :
>> 'org.apache.cassandra.io.compress.SnappyCompressor'};
>>
>>  Only the "data" column is used during the test.
>>
>>
>>
>> What consistency are you doing your writes with ?
>> I am writing with consistency level ONE.
>>
>>
>> What are the values for these settings in cassandra.yaml
>>
>> memtable_total_space_in_mb: No value set in cassandra.yaml, so 1/3 of
>> heap according to documentation (2gb / 3)
>> memtable_flush_writers: No value set in cassandra.yaml, but only one data
>> directory so I assume it is 1.
>> memtable_flush_queue_size: 4
>> compaction_throughput_mb_per_sec: 16
>> concurrent_writes: 32
>>
>>
>> Which version of Cassandra?
>>  1.1.8
>>
>>
>>  Hope this helps you help me :)
>> Best regards,
>> Joel Samuelsson
>>
>>
>> 2013/3/22 Joel Samuelsson <samuelsson.joel@gmail.com>
>>
>>> Thanks for the GC suggestion. It seems we didn't have enough CPU power
>>> to handle both the data and GC. Increasing the number of CPU cores made
>>> everything run smoothly at the same load.
>>>
>>>
>>> 2013/3/21 Andras Szerdahelyi <andras.szerdahelyi@ignitionone.com>
>>>
>>>>  Neat!
>>>> Thanks.
>>>>
>>>>   From: Sylvain Lebresne <sylvain@datastax.com>
>>>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> Date: Thursday 21 March 2013 10:10
>>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> Subject: Re: Cassandra freezes
>>>>
>>>>   Prior to 1.2 the index summaries were not saved on disk, and were
>>>> thus computed on startup while the sstable was loaded. In 1.2 they now are
>>>> saved on disk to make startup faster (
>>>> https://issues.apache.org/jira/browse/CASSANDRA-2392). That being
>>>> said, if the index_interval value used by a summary saved doesn't match the
>>>> current one while the sstable is loaded, the summary is recomputed anyway,
>>>> so restarting a node should always take a new index_interval setting into
>>>> account.
>>>>
>>>>  --
>>>> Sylvain
>>>>
>>>>
>>>> On Thu, Mar 21, 2013 at 9:43 AM, Andras Szerdahelyi <
>>>> andras.szerdahelyi@ignitionone.com> wrote:
>>>>
>>>>> I can not find the reference that notes having to upgradesstables when
>>>>> you
>>>>> change this. I really hope such complex assumptions are not
>>>>> formulating in
>>>>> my head just on their own and there actually exists some kind of
>>>>> reliable
>>>>> reference that clears this up :-) but,
>>>>>
>>>>> # index_interval controls the sampling of entries from the primrary
>>>>> # row index in terms of space versus time. The larger the interval,
>>>>> # the smaller and less effective the sampling will be. In technicial
>>>>> # terms, the interval coresponds to the number of index entries that
>>>>> # are skipped between taking each sample. All the sampled entries
>>>>> # must fit in memory. Generally, a value between 128 and 512 here
>>>>> # coupled with a large key cache size on CFs results in the best trade
>>>>> # offs. This value is not often changed, however if you have many
>>>>> # very small rows (many to an OS page), then increasing this will
>>>>> # often lower memory usage without a impact on performance.
>>>>>
>>>>> it is ( very ) safe to assume the row index is re-built/updated when
>>>>> new
>>>>> sstables are built.
>>>>> Obviously the sample of this index will have to follow this process
>>>>> very
>>>>> closely.
>>>>>
>>>>> It is possible however that the sample itself is not persisted and is
>>>>> built at startup as opposed to *only* when the index changes.( which
is
>>>>> what I thought was happening )
>>>>> It shouldn't be too difficult to verify this, but I'd appreciate if
>>>>> someone who looked at this before could confirm if this is the case.
>>>>>
>>>>> Thanks,
>>>>> Andras
>>>>>
>>>>> On 21/03/13 09:13, "Michal Michalski" <michalm@opera.com> wrote:
>>>>>
>>>>> >About index_interval:
>>>>> >
>>>>> >> 1) you have to rebuild stables ( not an issue if you are evaluating,
>>>>> >>doing
>>>>> >> test writes.. Etc, not so much in production )
>>>>> >
>>>>> >Are you sure of this? As I understand indexes, it's not required
>>>>> because
>>>>> >this parameter defines an interval of in-memory index sample, which
is
>>>>> >created during C* startup basing on a primary on-disk index file.
The
>>>>> >fact that Heap usage is reduced immediately after C* restart seem
to
>>>>> >confirm this, but maybe I miss something?
>>>>> >
>>>>> >M.
>>>>>
>>>>>
>>>>
>>>
>>
>
>

Mime
View raw message