In fact read_repair_chance just doesn't solves it by itself  I have to do the below as well to get the same.

Combination 1
1.in_memory_compaction_limit_in_mb: 2 from 64 
2.compaction_throughput_mb_per_sec: 1 from 16 
3.concurrent_reads: 16 from 32   /*I have a 4 core machine*/
4.JVM_OPTS="$JVM_OPTS -Dcassandra.compaction.priority=1" 
5.Heap 8Gb + 2 Gb Young

Combination 2:
read_repair_chance : 0.1

It only works if both combination1 and combination2 are applied and neither other ways.

I have two doubts

1.Is my reads going to get impacted because of the same in terms of performance

2.Would stale or old data be more likely as well with read_repair going down.I have a CL:1 and RF:7 on a 7 node cluster.

Regards,
Shubham



On Wed, Feb 8, 2012 at 2:20 PM, shubham srivastava <shubham.k@gmail.com> wrote:
did all the same

1.Disabled all row chaches
2.Heap Memory to 8Gb , young Gen reduced to 2Gb
3.in memory compaction to 2 

Still the problem persists. 

I tried reducing the read_repair_chance from 1 to 0.1 , this helped a bit so I can run some fairly small write jobs without getting some nodes hanging, infact I tried this with a test setup on 2 nodes only so may need to test it across a bigger cluster.

I also ran a write job on a node on which I disabled gossip and without much hassles It was successful although I can see mutable requests getting dropped on the other node. The problem with  this is I need to repair the other node to get the same data on the write node.this should have synched in quick time.

 In cassandra if I need to write in a single node and when all the writes are finished on this then get the syncing done with other nodes, is this a normal scenario etc or how should I do it without  because my guess is the way solr is writing data to cassandra is a problem , its actually streaming couple of data together(512 docs at a time) and then dumping it.

Regards,
Shubham



On Wed, Feb 8, 2012 at 12:57 AM, aaron morton <aaron@thelastpickle.com> wrote:
Is Cassandra logging messages about the the heap been full, what are they saying ? 
They are saying http://khaaan.com/

You do not have enough memory allocated to the JVM to work efficiently, that needs to be fixed. 

First step, disable all row and key caches until you get stable. 

The size of memory is there a way to calculate the max required or I have to hit and try.
The best approach is to let Cassandra take care of it. See the conf/cassandra-env.sh . The maximum useful heap size is 8GB. 

When you say memtable threshold is 0.75 do you mean the memtable_total_space_in_mb setting in yaml ? 
I am talking of : flush_largest_memtables_at: 0.75
That is a safety valve, the fact that is it trigger is a symptom of memory issues. see http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/

  Are you talking of in_memory_compaction_limit_in_mb which is by default 1/3 of heap size. 
in_memory_compaction_limit defaults to 64MB and influences how much memory compaction will use 

Is RF:7 increasing the pressure as well. But if I reduce it to say 4 I am effectively putting the load to only 4 servers as Solandra writes all the docs driven by a max_docs_per_shard_size (which is 1Million) to a single shard (single physical node:-> it uses it own partitioner) and I have total docs as 300K.

I do not know that much about solandra. 

I would do this:
* disable all the caches
* allocate more memory / let cassandra take care of it (you have a high setting for the young gen).
* consider reducing the in_memory_compaction_limit

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 7/02/2012, at 2:46 PM, shubham srivastava wrote:

Hi Aaron,

Is Cassandra logging messages about the the heap been full, what are they saying ? 

Most of the times but not always below logs are published by cassandra:

 WARN [ScheduledTasks:1] 2012-02-06 22:17:50,375 GCInspector.java (line 143) Heap is 0.784281626909687 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-02-06 22:17:50,376 StorageService.java (line 2418) Flushing ColumnFamilyStore(table='LH', columnFamily='SavedSearchRequest') to relieve memory pressure
 INFO [ScheduledTasks:1] 2012-02-06 22:17:50,629 ColumnFamilyStore.java (line 1128) Enqueuing flush of Memtable-SavedSearchRequest@568443115(6964940/13461931 serialized/live bytes, 5794 ops)
 WARN [ScheduledTasks:1] 2012-02-06 22:17:51,242 StorageService.java (line 2422) Flushing ColumnFamilyStore(table='LH', columnFamily='BookedHotels') to relieve memory pressure
 INFO [ScheduledTasks:1] 2012-02-06 22:17:51,242 ColumnFamilyStore.java (line 1128) Enqueuing flush of Memtable-BookedHotels@2031509445(1575/1968 serialized/live bytes, 35 ops)
 INFO [FlushWriter:5] 2012-02-06 22:17:51,244 Memtable.java (line 237) Writing Memtable-SavedSearchRequest@568443115(6964940/13461931 serialized/live bytes, 5794 ops)
 INFO [ScheduledTasks:1] 2012-02-06 22:18:03,144 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 3783 ms for 2 collections, 2785603776 used; max is 6120341504

When you say memtable threshold is 0.75 do you mean the memtable_total_space_in_mb setting in yaml ? 
I am talking of : flush_largest_memtables_at: 0.75

Wide rows, such as the TL column family in the L keyspace can increase the about of GC work that goes on. This can be somewhat alleviated by reducing the in_memory_compaction_limit_in_mb , this will increase the amount of IO that is needed. 

  Are you talking of in_memory_compaction_limit_in_mb which is by default 1/3 of heap size. 


Heavy write traffic can also increase GC activity. Watch the tp stats during the load process, are you overloading the cluster ? Do you see pending tasks backing up ? 

Yes :Below are the same:
#TpStats

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                        32       922         159030         0                 0
RequestResponseStage              0         0           4563         0                 0
MutationStage                     0         0         249357         0                 0
ReadRepairStage                   0         0              3         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0           4670         0                 0
AntiEntropyStage                  0         0              0         0                 0
MigrationStage                    0         0              0         0                 0
MemtablePostFlusher               0         0             19         0                 0
StreamStage                       0         0              0         0                 0
FlushWriter                       0         0             19         0                 5
MiscStage                         0         0              0         0                 0
FlushSorter                       0         0              0         0                 0
InternalResponseStage             0         0              0         0                 0
HintedHandoff                     0         0              8         0                 0

Message type           Dropped
RANGE_SLICE                  1
READ_REPAIR                  0
BINARY                       0
READ                    147188
MUTATION                     1
REQUEST_RESPONSE           327 


The size of memory is there a way to calculate the max required or I have to hit and try.

Is RF:7 increasing the pressure as well. But if I reduce it to say 4 I am effectively putting the load to only 4 servers as Solandra writes all the docs driven by a max_docs_per_shard_size (which is 1Million) to a single shard (single physical node:-> it uses it own partitioner) and I have total docs as 300K.

Regards,
Shubham

On Tue, Feb 7, 2012 at 12:51 AM, aaron morton <aaron@thelastpickle.com> wrote:
Sounds like GC is the first problem to tackle. Can you give the process more memory ?  

Look at the logs and work out what the GC is doing, the cassandra logs are not exactly the same as this but you get the idea http://blogs.oracle.com/poonam/entry/understanding_cms_gc_logs

The next thing I normally do is revert all changes to the default memory and GC settings. 

When you say memtable threshold is 0.75 do you mean the memtable_total_space_in_mb setting in yaml ? 
Is Cassandra logging messages about the the heap been full, what are they saying ? 

Wide rows, such as the TL column family in the L keyspace can increase the about of GC work that goes on. This can be somewhat alleviated by reducing the in_memory_compaction_limit_in_mb , this will increase the amount of IO that is needed. 

  Are you talking of in_memory_compaction_limit_in_mb which is by default 1/3 of heap size. 


Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 4/02/2012, at 5:58 PM, Shubham Srivastava wrote:

I have a cassandra setup with 7 Node Ring single DC with RF:7 and Read:CL:1.There is live traffic on these nodes except one -> the traffic is 90% read. There are also writes happening to all these nodes which typically are user specific data etc.These nodes aNow at times what is happening is there are 2-3 say nodes getting in a hanged state and eventually the whole ring behaves so. There state in terms of CPU Usage is very High 17Loads ,network incoming and outgoing requests packet were very high and there is continuous GC(Major and Minor) pauses happening and Read Messages dropped  simultaneously.Also the SSTables couunt have increased and decreased for some families during this time. So majorly its the GC and SStable compaction and Memtable flush happening. We are running cassandra behind solr using Solandra.The max docs that we have are around 0.5Million.

We keep one node as the main write node where we run sheduled job thats pools data into the cluster from MySql db.These jobs are run hourly.Recently we have increased the data to somewhere to the size of 30X . Before the same setup was stable and these jobs used to run in every 5 minutes apart from the similar problem 2-3 times.

GC:Settings
4Gb Heap Xmx,Xms
2Gb Young Geneartion: Xmn
ParNew
CMS
-XX:+UseParNewGC"
-XX:+UseConcMarkSweepGC"
-XX:+CMSParallelRemarkEnabled"
-XX:SurvivorRatio=8"
-XX:MaxTenuringThreshold=1"
-XX:CMSInitiatingOccupancyFraction=75"
-XX:+UseCMSInitiatingOccupancyOnly"

Memtable Threshold is also:0.75 and rest default cassandra settings

Also we dont have in place a connectionpool for thrift.We are using cassandra:0.8.6 with solr:3.3

Will share the Cfstats shortly or anything else needed for that matter.Can you guys help me kn this.


==========================================


Keyspace: system
        Read Count: 19
        Read Latency: 5.6876842105263155 ms.
        Write Count: 2375
        Write Latency: 0.010477894736842104 ms.
        Pending Tasks: 0
                Column Family: NodeIdInfo
                SSTable count: 0
                Space used (live): 0
                Space used (total): 0
                Number of Keys (estimate): 0
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 0
                Read Count: 0
                Read Latency: NaN ms.
                Write Count: 0
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 1
                Key cache size: 0
                Key cache hit rate: NaN
                Row cache: disabled
                Compacted row minimum size: 0
                Compacted row maximum size: 0
                Compacted row mean size: 0

                Column Family: HintsColumnFamily
                SSTable count: 2
                Space used (live): 492064
                Space used (total): 492064
                Number of Keys (estimate): 256
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 3
                Read Count: 10
                Read Latency: 5.643 ms.
                Write Count: 2372
                Write Latency: 0.010 ms.
                Pending Tasks: 0
                Key cache capacity: 2
                Key cache size: 2
                Key cache hit rate: 0.42857142857142855
                Row cache: disabled
                Compacted row minimum size: 219343
                Compacted row maximum size: 263210
                Compacted row mean size: 263210

                Column Family: Schema
                SSTable count: 2
                Space used (live): 20827
                Space used (total): 20827
                Number of Keys (estimate): 256
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 0
                Read Count: 3
                Read Latency: 4.253 ms.
                Write Count: 0
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 2
                Key cache size: 2
                Key cache hit rate: 0.0
                Row cache: disabled
                Compacted row minimum size: 104
                Compacted row maximum size: 8239
                Compacted row mean size: 3314

                Column Family: Migrations
                SSTable count: 2
                Space used (live): 33180
                Space used (total): 33180
                Number of Keys (estimate): 256
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 0
                Read Count: 0
                Read Latency: NaN ms.
                Write Count: 0
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 2
                Key cache size: 0
                Key cache hit rate: NaN
                Row cache: disabled
                Compacted row minimum size: 9888
                Compacted row maximum size: 17084
                Compacted row mean size: 14474

                Column Family: IndexInfo
                SSTable count: 0
                Space used (live): 0
                Space used (total): 0
                Number of Keys (estimate): 0
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 0
                Read Count: 0
                Read Latency: NaN ms.
                Write Count: 0
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 1
                Key cache size: 0
                Key cache hit rate: NaN
                Row cache: disabled
                Compacted row minimum size: 0
                Compacted row maximum size: 0
                Compacted row mean size: 0

                Column Family: LocationInfo
                SSTable count: 3
                Space used (live): 15844
                Space used (total): 15844
                Number of Keys (estimate): 384
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 2
                Read Count: 6
                Read Latency: 6.479 ms.
                Write Count: 3
                Write Latency: 0.004 ms.
                Pending Tasks: 0
                Key cache capacity: 3
                Key cache size: 3
                Key cache hit rate: 0.3333333333333333
                Row cache: disabled
                Compacted row minimum size: 73
                Compacted row maximum size: 310
                Compacted row mean size: 126

----------------
Keyspace: L
        Read Count: 535675
        Read Latency: 4.472875136976712 ms.
        Write Count: 282216
        Write Latency: 0.08156946806701251 ms.
        Pending Tasks: 0
                Column Family: FC
                SSTable count: 6
                Space used (live): 444790202
                Space used (total): 444790202
                Number of Keys (estimate): 768
                Memtable Columns Count: 93231
                Memtable Data Size: 172790560
                Memtable Switch Count: 1
                Read Count: 6651
                Read Latency: 62.041 ms.
                Write Count: 94235
                Write Latency: 0.026 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache: disabled
                Compacted row minimum size: 104
                Compacted row maximum size: 10090808
                Compacted row mean size: 4256880

                Column Family: Docs
                SSTable count: 7
                Space used (live): 1487268134
                Space used (total): 1487268134
                Number of Keys (estimate): 303744
                Memtable Columns Count: 98040
                Memtable Data Size: 37732086
                Memtable Switch Count: 1
                Read Count: 125986
                Read Latency: 1.868 ms.
                Write Count: 11900
                Write Latency: 0.520 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache: disabled
                Compacted row minimum size: 36
                Compacted row maximum size: 51012
                Compacted row mean size: 5547

                Column Family: SI
                SSTable count: 3
                Space used (live): 136512555
                Space used (total): 136512555
                Number of Keys (estimate): 286848
                Memtable Columns Count: 5344
                Memtable Data Size: 21139811
                Memtable Switch Count: 1
                Read Count: 27642
                Read Latency: 0.880 ms.
                Write Count: 5815
                Write Latency: 0.093 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache: disabled
                Compacted row minimum size: 73
                Compacted row maximum size: 30130992
                Compacted row mean size: 5022190

                Column Family: TL
                SSTable count: 4
                Space used (live): 314821524
                Space used (total): 314821524
                Number of Keys (estimate): 512
                Memtable Columns Count: 83428
                Memtable Data Size: 16069959
                Memtable Switch Count: 1
                Read Count: 10867
                Read Latency: 17.949 ms.
                Write Count: 5386
                Write Latency: 0.578 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache: disabled
                Compacted row minimum size: 12108971
                Compacted row maximum size: 186563160
                Compacted row mean size: 90195666

                Column Family: TI
                SSTable count: 6
                Space used (live): 1451685937
                Space used (total): 1451685937
                Number of Keys (estimate): 3910144
                Memtable Columns Count: 166006
                Memtable Data Size: 184308808
                Memtable Switch Count: 1
                Read Count: 364529
                Read Latency: 4.194 ms.
                Write Count: 164880
                Write Latency: 0.065 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache: disabled
                Compacted row minimum size: 104
                Compacted row maximum size: 7007506
                Compacted row mean size: 18922

----------------
Keyspace: LH
        Read Count: 61353
        Read Latency: 0.20730264208759147 ms.
        Write Count: 56289
        Write Latency: 0.13461228659240704 ms.
        Pending Tasks: 0
                Column Family: UserPrefrences
                SSTable count: 4
                Space used (live): 335309522
                Space used (total): 335309522
                Number of Keys (estimate): 1513600
                Memtable Columns Count: 12807
                Memtable Data Size: 9144714
                Memtable Switch Count: 1
                Read Count: 20132
                Read Latency: 0.201 ms.
                Write Count: 27817
                Write Latency: 0.009 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache: disabled
                Compacted row minimum size: 61
                Compacted row maximum size: 215
                Compacted row mean size: 179

                Column Family: LastViewedHotels
                SSTable count: 3
                Space used (live): 152316912
                Space used (total): 152316912
                Number of Keys (estimate): 767104
                Memtable Columns Count: 3059
                Memtable Data Size: 2906758
                Memtable Switch Count: 1
                Read Count: 9077
                Read Latency: 0.180 ms.
                Write Count: 3585
                Write Latency: 0.018 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache capacity: 10000
                Row cache size: 565
                Row cache hit rate: 0.7896882229811611
                Compacted row minimum size: 36
                Compacted row maximum size: 51012
                Compacted row mean size: 144

                Column Family: BookedHotels
                SSTable count: 3
                Space used (live): 7274700
                Space used (total): 7274700
                Number of Keys (estimate): 39680
                Memtable Columns Count: 7
                Memtable Data Size: 392
                Memtable Switch Count: 1
                Read Count: 139
                Read Latency: 0.027 ms.
                Write Count: 10
                Write Latency: 0.008 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache capacity: 10000
                Row cache size: 4
                Row cache hit rate: 0.2302158273381295
                Compacted row minimum size: 87
                Compacted row maximum size: 35425
                Compacted row mean size: 139

                Column Family: HotelMessage
                SSTable count: 1
                Space used (live): 349735
                Space used (total): 349735
                Number of Keys (estimate): 512
                Memtable Columns Count: 19424
                Memtable Data Size: 3408866
                Memtable Switch Count: 1
                Read Count: 4726
                Read Latency: 0.059 ms.
                Write Count: 9431
                Write Latency: 0.739 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache capacity: 10000
                Row cache size: 144
                Row cache hit rate: 0.968049090139653
                Compacted row minimum size: 87
                Compacted row maximum size: 24601
                Compacted row mean size: 867

                Column Family: SavedHotels
                SSTable count: 1
                Space used (live): 650153
                Space used (total): 650153
                Number of Keys (estimate): 3456
                Memtable Columns Count: 13
                Memtable Data Size: 728
                Memtable Switch Count: 1
                Read Count: 4282
                Read Latency: 0.029 ms.
                Write Count: 15
                Write Latency: 0.014 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache capacity: 10000
                Row cache size: 13
                Row cache hit rate: 0.06819243344231668
                Compacted row minimum size: 104
                Compacted row maximum size: 2299
                Compacted row mean size: 160

                Column Family: SavedHotelsInverted
                SSTable count: 1
                Space used (live): 646988
                Space used (total): 646988
                Number of Keys (estimate): 3456
                Memtable Columns Count: 13
                Memtable Data Size: 728
                Memtable Switch Count: 1
                Read Count: 13
                Read Latency: 3.014 ms.
                Write Count: 15
                Write Latency: 0.007 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache capacity: 10000
                Row cache size: 2
                Row cache hit rate: 0.15384615384615385
                Compacted row minimum size: 104
                Compacted row maximum size: 2299
                Compacted row mean size: 160

                Column Family: LastViewedHotelsInverted
                SSTable count: 4
                Space used (live): 147678370
                Space used (total): 147678370
                Number of Keys (estimate): 770048
                Memtable Columns Count: 2486
                Memtable Data Size: 2524930
                Memtable Switch Count: 1
                Read Count: 2667
                Read Latency: 0.622 ms.
                Write Count: 3590
                Write Latency: 0.010 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache capacity: 10000
                Row cache size: 477
                Row cache hit rate: 0.3820772403449569
                Compacted row minimum size: 36
                Compacted row maximum size: 51012
                Compacted row mean size: 160

                Column Family: SavedSearchRequest
                SSTable count: 11
                Space used (live): 4560932807
                Space used (total): 4560932807
                Number of Keys (estimate): 1022336
                Memtable Columns Count: 7762
                Memtable Data Size: 17281321
                Memtable Switch Count: 1
                Read Count: 20317
                Read Latency: 0.242 ms.
                Write Count: 11827
                Write Latency: 0.022 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache capacity: 10000
                Row cache size: 619
                Row cache hit rate: 0.7583304621745336
                Compacted row minimum size: 925
                Compacted row maximum size: 1955666
                Compacted row mean size: 5014

                Column Family: HotelTariffs
                SSTable count: 3
                Space used (live): 42775204
                Space used (total): 42775204
                Number of Keys (estimate): 18176
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 0
                Read Count: 0
                Read Latency: NaN ms.
                Write Count: 0
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache capacity: 10000
                Row cache size: 0
                Row cache hit rate: NaN
                Compacted row minimum size: 180
                Compacted row maximum size: 9887
                Compacted row mean size: 2179

I would appreciate a quick help.

Regards,
Shubham